Skip to main content
Dryad logo

Data from: Improving performance of hurdle models using rare-event weighted logistic regression: An application to maternal mortality data

Citation

Omondi, Evans; Okello, Sharon; Odhiambo, Collins (2022), Data from: Improving performance of hurdle models using rare-event weighted logistic regression: An application to maternal mortality data, Dryad, Dataset, https://doi.org/10.5061/dryad.zs7h44jdc

Abstract

In this paper, the performance of hurdle models in rare events data is improved by modifying their binary component. The rare-event weighted logistic regression model is adopted in place of logistic regression to deal with class imbalance due to rare events. Poisson Hurdle Rare Event Weighted Logistic Regression (REWLR) and Negative Binomial Hurdle (NBH) REWLR are developed as two-part models which use the REWLR model to estimate the probability of a positive count and a Poisson or NB zero-truncated count model to estimate non-zero counts. The obtained results are numerically validated and then discussed from both the mathematical and the maternal mortality perspective. Numerical simulations are also presented to give a more complete representation of the model dynamics. Results obtained suggest that NB Hurdle REWLR is the best-performing model for zero-inflated count data due to rare events.

Methods

The study uses secondary data which are publicly available. The maternal mortality data were pulled from https://jphesportal.uonbi.ac.ke/dhis-web-commons/security/login.action, a portal of District Health Information Software (DHIS2), that streamlines health data reporting.

Usage Notes

The dataset requires Microsoft Excel to open. An R code is also provided.