Feature statistics for the prediction of postoperative delirium in the recovery room

Giesa, Niklas 1 ; Haufe, Stefan1; Menk, Mario1; Weiß, Björn1; Spies, Claudia D.1; Piper, Sophie K.1; Balzer, Felix1; Boie, Sebasitan D.1

Published Mar 21, 2025 on Dryad. https://doi.org/10.5061/dryad.1vhhmgr2g

Data files

Mar 21, 2025 version files 174.81 KB

public_data_repository_table_1.csv

2.90 KB
public_data_repository_table_2.csv

87.48 KB
public_data_repository_table_3.csv

80.96 KB
README.md
3.48 KB

Abstract

Background: Postoperative delirium (POD) contributes to severe outcomes such as death or development of dementia. Thus, it is desirable to identify vulnerable patients in advance during the perioperative phase. Previous studies mainly investigated risk factors for delirium during hospitalization and further used a linear logistic regression (LR) approach with time-invariant data. Studies have not investigated patients’ fluctuating conditions to support POD precautions.

Objective: In this single-center study, we aimed to predict POD in a recovery room setting with a non-linear machine learning (ML) technique using pre-, intra-, and postoperative data.

Methods: The target variable POD was defined with the Nursing Screening Delirium Scale (Nu-DESC) ≥ 1. Feature selection was conducted based on robust univariate test statistics and L1 regularization. Non-linear multi-layer perceptron (MLP) as well as tree-based models were trained and evaluated – with the receiver operating characteristics curve (AUROC), the area under precision recall curve (AUPRC), and additional metrics – against LR and published models on bootstrapped testing data.

Results:The prevalence of POD was 8.2% in a sample of 73,181 surgeries performed between 2017 and 2020. Significant univariate impact factors were the preoperative ASA status, the intraoperative amount of given remifentanil, and the postoperative Aldrete score. The best model used pre-, intra-, and postoperative data. The tree-based model achieved a mean AUROC of 0.854 and a mean AUPRC of 0.418 outperforming linear LR, well as best applied and retrained baseline models.

Conclusions: Overall, non-linear machine learning models using data from multiple perioperative time phases were superior to traditional ones in predicting POD in the recovery room. Class imbalance was seen as a main impediment for model application in clinical practice.

https://doi.org/10.5061/dryad.1vhhmgr2g

We created this dataset for performing robust univariate MWU statistics that contributed to feature explanations for our trained machine learning models. Models were trained with binary targets discriminating POD and no POD surgeries. Our models ingest aggregated data in form of percentiles and other aggregations that summarize multiple values recorded during one surgery.

Description of the data and file structure

We submit our data as separate files. Please find metrics describing our cohort on the first one. Descriptive statistics of extracted raw data that could comprise multiple values per surgery in the next file. The last file describes concrete model inputs that were derived by summary statistics like percentiles calculated on raw extracted data.

On the first tab “Table 1” the following columns are provided:

| Domain | Variable | Unit | All | POD Positives (y=1) | POD Negatives (y=0) |
| :—– | :——- | :— | :— | :—————— | :—————— |

Here, the domain describes the clinical context of the variable that is more specified with a unit by following columns. The column “All” represents summary statistics as counts or mean [1st,2nd,3rd] quartiles, the following columns include the statistics for “POD Positives” or “POD Negatives”. Hence, numerical variables e.g. age as cited with 56, [42, 60, 73] years in the Table 1 reflects 56 years as the mean and 42,60,73 as 1st, 2nd, and 3rd quartile. Additional data fields, included in the table, hold fractions, e.g., 53% female, 57% male as the sex distribution.

In the Extracted Data sheet, we include the columns:

| feature | set | domain | unit | tl | missingness | min | max | mean | std | 10_perc | 25_perc | 50_perc | 75_perc | 90_perc | mad |
| :—— | :– | :—– | :— | :- | :———- | :– | :– | :— | :– | :——- | :——- | :——- | :——- | :——- | :– |

We describe the feature in sets (train/test) and the clinical domain as well as units and corresponding time line (tl). The missingness as a fraction of all patients having missing values and further summary statistics are provided to extensively report possible biases between sets. We include summary statistics as columns covering min, max, mean, the standard-deviation (std), as well as 10th, 25th, 50th, 75th, and 90th percentiles (perc). The last metrics denotes the mean absolute deviation as mad.

For the aggregated data sheet, we provide:

| feature | mean | std | min | 25_perc | 50_perc | 75_perc | max | tl | dataset |
| :—— | :— | :– | :– | :——- | :——- | :——- | :– | :- | :—— |

As summary statistics after aggregation also per tl to show differences of test and train sets. Like before, we include summary statistics as columns. To gain an overview in differences between train and testing sets (dataset), we provided the mean, the standard-deviation (std), min, as well as 25th, 50th, 75th and the max per corresponding time line (tl).

Sharing/Access information

Data is required to be downloaded for researchers for understanding and validating our research results. There is no restriction on how to use our data for secondary analysis.

Code/Software

We used python code that is described in more detail under https://github.com/ngiesa/icdep.