Skip to main content
Dryad logo

Prediction model of in-hospital mortality in intensive care unit patients with heart failure: machine learning-based, retrospective analysis of the MIMIC-III database


Zhou, Jingmin et al. (2021), Prediction model of in-hospital mortality in intensive care unit patients with heart failure: machine learning-based, retrospective analysis of the MIMIC-III database, Dryad, Dataset,


Objective: The predictors of in-hospital mortality for intensive care units (ICU)-admitted HF patients remain poorly characterized.We aimed to develop and validate a prediction model for all-cause in-hospital mortality among ICU-admitted HF patients.

Design: A retrospective cohort study.

Setting and Participants: Data were extracted from the MIMIC-III database. Data on 1,177 heart failure patients were analysed.

Methods: Patients meeting the inclusion criteria were identified from the MIMIC-III database and randomly divided into derivation and validation groups. Independent risk factors for in-hospital mortality were screened using XGBoost and LASSO regression models in the derivation sample. Multivariable logistic regression analysis was used to build prediction models. Discrimination, calibration, and clinical usefulness of the predicting model were assessed using the C-index, calibration plot, and decision curve analysis. After pairwise comparison, the best performing model was chosen to build a nomogram according to the regression coefficients.

Results: Among the 1,177 admissions, in-hospital mortality was 13.52%. In both groups, the XGBoost, LASSO regression, and GWTG-HF risk score models showed acceptable discrimination. The XGBoost and LASSO regression models also showed good calibration. In pairwise comparison, the prediction effectiveness was higher with the XGBoost and LASSO regression models than with the GWTG-HF risk score model (P<0.05). The XGBoost model was chosen as our final model for its more concise and wider net benefit threshold probability range and was presented as the nomogram.

Conclusions: Our nomogram enabled good prediction of in-hospital mortality in ICU-admitted HF patients, which may help clinical decision-making for such patients.


Data Source

The MIMIC-III database (version 1.4, 2016) is a publicly available critical care database containing de-identified data on 46,520 patients and 58,976 admissions to the ICU of the Beth Israel Deaconess Medical Center, Boston, USA, between 1 June, 2001 and 31 October, 2012. These data include comprehensive information, such as demographics, admitting notes, International Classification of Diseases-9th revision (ICD-9) diagnoses, laboratory tests, medications, procedures, fluid balance, discharge summaries, vital sign measurements undertaken at the bedside, caregivers notes, radiology reports, and survival data12. After successful completion of the National Institutes of Health Protecting Human Research Participants web-based training course, we obtained approval to extract data from MIMIC-III for research purposes (Certification Number: 28860101).

Patient and Public Involvement

Patients and/or the public were not directly involved in this study.

Study Patients

Patients with a diagnosis of HF, identified by manual review of ICD-9 codes, and who were >15 years old at the time of ICU admission were included in the study; two researchers conducted the ICD-9 code review. Patients without an ICU record or data missing for left ventricular ejection fraction (LVEF) or N-terminal pro-brain natriuretic peptide (NT-proBNP) were excluded from the study. Figure 1A illustrates the flow chart showing selection of patients into the study. A total of 13,389 patients with a diagnosis of HF were screened and 1,177 adult patients were included in this study (Figure 1A).

Data Extraction

Using Structured Query Language queries (PostgreSQL, version 9.6), demographic characteristics, vital signs, and laboratory values data were extracted from the following tables in the MIMIC III dataset: ADMISSIONS, PATIENTS, ICUSTAYS, D_ICD DIAGNOSIS, DIAGNOSIS_ICD, LABEVENTS, D_LABIEVENTS, CHARTEVENTS, D_ITEMS, NOTEEVENTS, and OUTPUTEVENTS. Based on previous studies 7-9 13-15, clinical relevance, and general availability at the time of presentation, we extracted the following data: demographic characteristics (age at the time of hospital admission, sex, ethnicity, weight, and height); vital signs (heart rate, (HR), systolic blood pressure [SBP], diastolic blood pressure [DBP], mean blood pressure, respiratory rate, body temperature, saturation pulse oxygen [SPO2], urine output [first 24 h]); comorbidities (hypertension, atrial fibrillation, ischemic heart disease, diabetes mellitus, depression, hypoferric anemia, hyperlipidemia, chronic kidney disease (CKD), and chronic obstructive pulmonary disease [COPD]); and laboratory variables (hematocrit, red blood cells, mean corpuscular hemoglobin [MCH], mean corpuscular hemoglobin concentration [MCHC], mean corpuscular volume [MCV], red blood cell distribution width [RDW], platelet count, white blood cells, neutrophils, basophils, lymphocytes, prothrombin time [PT], international normalized ratio [INR], NT-proBNP, creatine kinase, creatinine, blood urea nitrogen [BUN] glucose, potassium, sodium, calcium, chloride, magnesium, the anion gap, bicarbonate, lactate, hydrogen ion concentration [pH], partial pressure of CO2 in arterial blood, and LVEF), using Structured Query Language (SQL) with PostgreSQL (version 9.6). Demographic characteristics and vital signs extracted were recorded during the first 24 hours of each admission and laboratory variables were measured during the entire ICU stay. Comorbidities were identified using ICD-9 codes. For variable data with multiple measurements, the calculated mean value was included for analysis. The primary outcome of the study was in-hospital mortality, defined as the vital status at the time of hospital discharge in survivors and non-survivors.

Missing Data Handling

Variables with missing data are common in the MIMIC-III, however eliminating patients with incomplete data can bias the study. Therefore, imputation is an important step in data preprocessing. All screening variables contained <25% missing values (Table S1). For normally distributed continuous variables, the missing values were replaced with the mean for the patient group. For skewed distributions related to continuous variables, missing values were replaced with their median. There were no missing dichotomous variables in our study16.

Statistical Analysis

We present baseline patient characteristics in both samples using a percentage of the total for categorical variables and mean ± standard deviation or median and interquartile range for continuous variables, depending on the normality of distribution. For categorical variables, we used a two-sided Pearson’s χ2 test or Fisher’s exact tests to assess differences in proportions between the two groups. For all continuous variables, we used a two-sided one-way analysis of variance or Wilcoxon rank-sum tests when comparing the two groups.

Figure 1B illustrates the methodology used to develop the prediction model. A total of 52 demographic, clinical, and biochemical variables were considered as candidate predictors based on existing literature, expert knowledge, and availability in clinical practice. Table 1 summarizes the predictor variables and summary statistics. Two methods were used to select the most important predictors for the in-hospital mortality prediction model from the derivation group. First, we used XGBoost 17, a supervised machine-learning and data-mining tool, which involves a meta-algorithm, to construct a strong ensemble learner from weak learners, such as regression trees 18. The parameters of a regression tree consist of the tree structures and the weights of the leaf nodes. They are sequentially optimized to minimize an objective function, consisting of a fitting loss term plus a regularization term, using gradient methods. XGBoost retrofits the tree-learning algorithm for handling sparse data by raising a weighted quantile sketch to approximate an optimization calculation and design a column block structure for parallel learning. The XGBoost algorithm can indicate the contributions of each of the predictors, making it possible to choose the most relevant predictors. The 20 top-ranked variables were selected for further analysis. Second, we used the least absolute shrinkage and selection operator (LASSO) method19, which involves regression analysis to perform both variable selection and regularization. This enhances prediction accuracy and interpretability of a statistical model and is suitable for reduction in high-dimensional data. Variables with non-zero coefficients in the LASSO regression model were selected for further analysis.

To investigate independent risk factors of in-hospital mortality, univariate logistic regression analysis was used to assess the significance of variables selected by each method in the derivation group. Variables significantly associated with in-hospital mortality were candidates for multivariate binary logistic regression. Potential non-linearity relationships between candidate continuous variables and in-hospital mortality was explored using a smoothing plot, and nomograms were formulated based on the results of multivariate logistic regression analysis. The nomogram was based on proportionally converting each regression coefficient in multivariate logistic regression to a 0–100-point scale. The prediction models were evaluated in terms of discrimination and calibration. Calibration curves were plotted to assess the calibration of the in-hospital mortality nomogram. Discrimination was assessed by calculating the area under the curve (AUC) of the receiver operating characteristic (ROC) curve and C-statistic testing. The 95% confidence interval (CI) was calculated using 500 bootstrap resamples. Decision curve analysis (DCA) 20was used to compare the clinical net benefit associated with the use of these models. The model with the highest AUC and highest clinical net benefit was used to develop a nomogram predicting in-hospital mortality.

The American Heart Association Get With The Guidelines–Heart Failure (GWTG-HF) risk score is a well-validated, widely accepted scoring system for risk stratification regarding in-hospital mortality9. This prediction model was validated in our study groups and compared with our developed model. Because the final published version was a risk score, the calculated GWTG-HF risk score for each of the study patients was used for further analysis. A non-parametric approach, using generalized U statistical theorizing to generate an estimated covariance matrix21, was used to analyze areas under the ROC curves and estimate differences in the discriminatory power between the models.

A two-tailed P-value of <0.05 indicated statistical significance in all analyses. All analyses were performed using EmpowerStats (version 2.17.8; and R software (Version 3.1.4;


Shanghai Minhang Science and Technology Commission, Award: 16XD1400700

National Natural Science Foundation of China, Award: 81370199

National Defense Basic Scientific Research Program of China, Award: 973 350 Program, 2012CB518605