The multiomic landscape of epidemiological factors contributing to preterm birth in low- and middle-income countries
Data files
May 19, 2023 version files 18.11 MB
Abstract
Preterm birth (PTB) is the leading cause of death in children under five, yet comprehensive studies are hindered by its multiple complex etiologies. Epidemiological associations between PTB and maternal characteristics have been previously described. This work employed multiomic profiling and multivariate modeling to investigate the biological signatures of these characteristics. Maternal covariates were collected during pregnancy from 13,841 pregnant women across five sites. Plasma samples from 231 participants were analyzed to generate proteomic, metabolomic, and lipidomic datasets. Machine learning models showed robust performance for the prediction of PTB (AUROC=0.70), time-to-delivery (r=0.65), maternal age (r=0.59), gravidity (r=0.56), and BMI (r=0.81). Time-to-delivery biological correlates included fetal-associated proteins (e.g., ALPP, AFP, PGF) and immune proteins (e.g., PD-L1, CCL28, LIFR). Maternal age negatively correlated collagen COL9A1; gravidity with endothelial NOS and inflammatory chemokine CXCL13; and BMI with leptin and structural protein FABP4. These results provide an integrated view of epidemiological factors associated with PTB and identify biological signatures of clinical covariates impacting this disease.
Methods
The study population comprised pregnant women selected from 5 biorepository-supported cohorts in Matlab, Bangladesh; Lusaka, Zambia; Sylhet, Bangladesh; Karachi, Pakistan; and Pemba, Tanzania. The study was approved by the Stanford University Institutional Review Board, and ethical exemptions were sought and obtained independently from the respective country by each birth cohort supported by the Alliance for Maternal and Newborn Health Improvement (AMANHI) and the Global Alliance to Prevent Prematurity and Stillbirth (GAPPS) biorepositories. Written informed participant consent was obtained from each participant in the original cohorts and extends to the present study. No compensation or incentives were provided for participating in this study. We followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guideline. This study analyzed plasma collected from May 2014 to August 2018.
Gestational age at the time of sampling was determined by ultrasonographic assessment. From all AMANHI and GAPPS cohorts, trained phlebotomists or nursing staff collected blood samples for centrifugation and aliquoting of serum, plasma, and buffy coat for storage in -80°C and future analyses. Collection and processing of all sample types were performed according to harmonized operating procedures at all study cohorts. Lipidomics and metabolomics features were generated using untargeted liquid-chromatography mass spectrometry, while proteomic features were generated using a highly multiplexed immunoassay (Olink Proteomics Inc.).
Usage notes
Scripts are in R and files are .csv and .xlsx. Xlsx files can be opened with Google Sheets as an open-source alternative.