Blood cell differential count discretization modeling predicts survival in adults reporting to the emergency room: a retrospective cohort study
Data files
Oct 10, 2023 version files 1.78 MB
-
DryadDB_bmjopen_2023_071937_R1_csv.csv
-
README.md
Abstract
Objectives: to assess survival predictivity of baseline blood cell differential count (BCDC), discretized according to two different methods, in adults visiting the Emergency Room (ER) for illness or trauma over one-year.
Design: Retrospective cohort study of hospital records.
Setting: Tertiary care public hospital in northern Italy.
Participants: 11052 patients aged > 18 years, consecutively admitted to the ER in one year, and for whom BCDC collection was indicated by ER medical staff at first presentation.
Primary outcome: Survival was the referral outcome for explorative model development. Automated BCDC analysis at baseline assessed hemoglobin, red cell mean volume (MCV) and distribution-width (RDW), platelet distribution-width (PDW), plateletcrit (PCT), absolute red blood cells, white blood cells, neutrophils, lymphocytes, monocytes, eosinophils, basophils, and platelets. Discretization cutoffs were defined by Benchmark and Tailored methods. Benchmark cutoffs were stated on laboratory reference values (CLSI). Tailored cutoffs for linear, sigmoid-shaped and for U-shaped distributed variables were discretized by Maximally Selected Rank Statistics and by Optimal-Equal Hazard Ratio respectively. Explanatory variables (age, gender, ER admission during SARS-CoV2 surges, in-hospital admission) were analyzed using Cox multivariable regression. ROC curves were drawn by sum of Cox-significant variables for each method.
Results: Of 11052 patients (median age 67 years, IQR 51–81, 48% female), 59% (n=6489) were discharged and 41% (n=4563) were admitted in hospital. After a 306-day median follow up (IQR 208–417 days), 9455 (86%) patients were alive and 1597 (14%) deceased. Increased HRs were associated with age >73-years (HR=4.6 CI=4.0–5.2), in-hospital admission (HR=2.2 CI=1.9–2.4), ER admission during SARS-CoV2 surges (Wave-I HR=1.7 CI=1.5–1.9); Wave-II HR=1.2 CI=1.0–1.3). Gender, hemoglobin, MCV, RDW, PDW, neutrophils, lymphocytes and eosinophils counts were significant in overall. Benchmark-BCDC model included basophils and platelet count (AUROC 0.74). Tailored-BCDC model included monocyte counts and plateletcrit (AUROC 0.79).
Conclusions: baseline discretized BCDC provides meaningful insight regarding Emergency Room patients survival.
README: Blood cell differential count discretization modeling predicts survival in adults reporting to the emergency room: a retrospective cohort study
https://doi.org/10.5061/dryad.dncjsxm5g
Description of the data and file structure
Fille is in .csv (comma separated value) format. Data are listed in a wide dataframe, de-identifyed patient variables are listed in rows by column variables as continuous and discrete. Column titles are abbreviation as listed in the reference article, as follows:
Hemoglobin (Hb), mean red cell volume (MCV), red cell distribution width (RDW), platelet distribution width (PDW), platelet hematocrit (PCT) and absolute count of red blood cells (RBC), white blood cells (WBC), neutrophils (Ne), lymphocytes (Ly), monocytes (Mo), eosinophils (Eo), basophils (Ba), and platelets (PLT),
[1] "Num" case number
[2] "Wave_I_II_off" three groups, based on COVID-19 registered two epidemic surges in Regione Lombardia issued by Italian Health Authority
[3] "Age" years
[4] "Sex" F female M male
[5] "ddFUp" days since admission in Emergency Room to follow up or to death date
[6] "deadTRUE" alive or deceased by logical (dead=TRUE)
[7] "Eos" eosinophils count in blood sample x10^9/L
[8] "Neu" neutrophils count in blood sample x10^9/L
[9] "WBC" white blood cells count in blood sample x10^9/L
[10]"RDW" red cell distribution width in blood sample (%)
[11] "MCV" mean red cell volume in blood sample (fl)
[12] "Hb" Hemoglobin in blood sample (g/dL)
[13] "RBC" red blood cells count in blood sample x10^12/L
[14] "Lym" lymphocytes count in blood sample x10^9/L
[15] "Mon" monocytes count in blood sample x10^9/L
[16] "PLT" platelets count in blood sample x10^9/L
[17] "PCT" platelet hematocrit in blood sample (%)
[18] "PDW" platelet distribution width in blood sample (%)
[19] "Bas" basophils count in blood sample x10^9/L
[20]"alive0_dead1" alive or deceased by factor (0 = alive, 1 =dead))
[21] "Dischrd0_Inward1" discharge at home or in- hospital admission to any ward by factor ( 0= discharged at home ; 1= admitted in hospital, any ward)
[22] "Hb_tailr" Hemoglobin value discretized by "tailored" log-relative hazard psplines (0=favourable; 1 = unfavorable)
[23] "RDW_tailr" red cell distribution width discretized by "tailored" log-relative hazard psplines (0=favourable; 1 = unfavorable)
[24] "RBC_tailr" red blood cells count value discretized by "tailored" log-relative hazard psplines (0=favourable; 1 = unfavorable)
[25] "MCV_tailr" mean red cell volume value discretized by "tailored" log-relative hazard psplines (0=favourable; 1 = unfavorable)
[26] "WBC_tailr" white blood cells value discretized by "tailored" log-relative hazard psplines (0=favourable; 1 = unfavorable)
[27] "Neu_tailr" neutrophils count value discretized by "tailored" log-relative hazard psplines (0=favourable; 1 = unfavorable)
[28] "Lym_tailr" lymphocytes count value discretized by "tailored" log-relative hazard psplines (0=favourable; 1 = unfavorable)
[29] "Mon_tailr" monocyte count value discretized by "tailored" log-relative hazard psplines (0=favourable; 1 = unfavorable)
[30] "Eos_tailr" eosinophil count value discretized by "tailored" log-relative hazard psplines (0=favourable; 1 = unfavorable)
[31] "Bas_tailr" basophils count value discretized by "tailored" log-relative hazard psplines (0=favourable; 1 = unfavorable)
[32] "PLT_tailr" platelets count value discretized by "tailored" log-relative hazard psplines (0=favourable; 1 = unfavorable)
[33] "PCT_tailr" platelet hematocrit value discretized by "tailored" log-relative hazard psplines (0=favourable; 1 = unfavorable)
[34] "PDW_tailr" platelet distribution width discretized by "tailored" log-relative hazard psplines (0=favourable; 1 = unfavorable)
[35] "Tailr_score_sum" sum of unfavorable hemogram values discretized by "tailored" log-relative hazard psplines
[36] "Tailr_BCDC_score" quintiles of sum of discretized by "tailored" log-relative hazard psplines unfavorable hemogram values
[37] "Tailr_AgeSex_InW_Wave" sum of unfavorable factors (all Tailored 1, age >73yrs, inward admission, in-COVID19 waves)
[38] "Age_0fav_1unf" age discretized by maximally selected rank statistic method (0= favourable <=73 yrs; 1 unfavorable >73y)
[39] "HB_bench" Hemoglobin value discretized by Clinical and Laboratory Standards Institute (CLSI) (benchmark)
[40] "RDW_bench" red cell distribution width discretized by Clinical and Laboratory Standards Institute (CLSI) (benchmark)
[41] "RBC_bench" red blood cells count value discretized by Clinical and Laboratory Standards Institute (CLSI) (benchmark)
[42] "MCV_bench" mean red cell volume value discretized by Clinical and Laboratory Standards Institute (CLSI) (benchmark)
[43] "WBC_bench" white blood cells value discretized by Clinical and Laboratory Standards Institute (CLSI) (benchmark)
[44] "Neu_bench" neutrophils count value discretized by Clinical and Laboratory Standards Institute (CLSI) (benchmark)
[45] "Lym_bench" lymphocyte count value discretized by Clinical and Laboratory Standards Institute (CLSI) (benchmark)
[46] "Mon_bench" monocyte count value discretized by Clinical and Laboratory Standards Institute (CLSI) (benchmark)
[47] "Eos_bench" eosinophil count value discretized by Clinical and Laboratory Standards Institute (CLSI) (benchmark)
[48] "Bas_bench" basophils count value discretized by Clinical and Laboratory Standards Institute (CLSI) (benchmark)
[49] "PLT_bench" platelets count value discretized by Clinical and Laboratory Standards Institute (CLSI) (benchmark)
[50 "PCT_bench" platelet hematocrit value discretized by Clinical and Laboratory Standards Institute (CLSI) (benchmark)
[51] "PDW_bench" platelet distribution width discretized by Clinical and Laboratory Standards Institute (CLSI) (benchmark)
[52] "Bench_score_sum" sum of unfavorable hemogram values discretized by Clinical and Laboratory Standards Institute (CLSI) (benchmark)
[53] "Bench_BCDC_score" quintiles of sum of discretized by Clinical and Laboratory Standards Institute (CLSI) (benchmark) hemogram values
[54] "Bench_AgeSex_InW_Wave" sum of unfavorable factors (all Benchmark 1, age >73yrs, inward admission, in-COVID19 waves)
Code/Software
.csv dataframe can be imported by an extremely wide range of spreadsheet and analysis dedicated software
Methods
Complete blood cell differential count (BCDC) was performed using the automated Sysmex XN-9000 analyzer on peripheral blood samples taken at baseline and stored in hospital Lab electronic archives by dates (starting 2020-01-01 ending 2020-12-31) and by dept (Pronto Soccorso). Data were handled in CSV format by RStudio. Survival was the referral outcome for explorative model development and was assessed on June 30th, 2021, by a population registry office query through the NHS territorial service. Lab data were converted from tong to wide and dataframe was joined with the survival dataframe by unique personal alphanumeric code assigned by Italian authorities to each citizen. Being under category of sensitive information, although codified and not overt, personal alphanumeric codes were then deleted by assigning each patient a sequential coding number in dataframe. Duplicates were deleted. Predictors were searched among the BCDC first automated analysis assessment at presentation of hemoglobin (Hb), mean red cell volume (MCV), red cell distribution width (RDW), platelet distribution width (PDW), platelet hematocrit (piastrinocrit) (PCT) and absolute count of red blood cells (RBC), white blood cells (WBC), neutrophils (Neu), lymphocytes (Lym), monocytes (Mon), eosinophils (Eos), basophils (Bas), and platelets (PLT). Missing data were excluded, as only patients having BCDC records were evaluated. Analysis was performed by R studio and by Jamovi free R-based software (The jamovi project (2021). jamovi. [Computer Software]. Retrieved from https://www.jamovi.org). The “Benchmark” reference model was set by discretization of BCDC continuous values on our laboratory reference interval, established according to the C28-A3 guideline by the Clinical and Laboratory Standards Institute (CLSI).
The “Tailored” discretization was set as follows. The relationship between each continuous variable and log relative hazard was plotted using the penalized B-splines (psplines) technique] for fitting the nonlinear effect of covariate in Cox models, by minimizing pitfalls associated with dichotomization of biological variables.
Variables were treated differently according to their respective distribution profile. Linear and sigmoid-shaped variables were dichotomized by the maximally selected rank statistic method (MSRS). U-shaped variables were univariately discretized by cutoff point determination using the optimal-equal hazard ratio method (OEHR) (Chen Y, Huang J, He X, et al. BMC Med Res Methodol. 2019 May 9;19(1):96. doi: 10.1186/s12874-019-0738-4)