Pregnancy and postpartum dynamics revealed by millions of lab tests
Data files
Nov 24, 2024 version files 23.50 MB
-
pregnancy_clalit.zip
23.50 MB
-
README.md
6.08 KB
Abstract
Pregnancy and delivery involve dynamic alterations in many physiological systems. However, the physiological dynamics during pregnancy and after delivery have not been systematically analyzed at high temporal resolution in a large human population. Here we present the dynamics of 76 lab tests based on a cross-sectional analysis of roughly 41 million measurements from over 300,000 pregnancies. We analyzed each test at weekly intervals from 20 weeks preconception to 80 weeks postpartum, providing detailed temporal profiles. About half of the tests take three months to a year to return to baseline during postpartum, highlighting the physiologic load of childbirth. The precision of the data revealed the effects of preconception supplements, overshoots after delivery, and intricate temporal responses to changes in blood volume and renal filtration rate. Pregnancy complications – gestational diabetes, pre-eclampsia, and postpartum hemorrhage – showed distinct dynamical changes. These results provide a comprehensive dynamic portrait of the systems physiology of pregnancy.
https://doi.org/10.5061/dryad.1c59zw44t
The dataset contains summary statistics on lab tests from >300K pregnancies in Israel in the period 2003-2020 who are members of “Clalit Healthcare”, Israel’s largest HMO. It spans 110 different tests at weekly resolution.
Where BMI information was available, the measurements were grouped into three: 15-18.5, 18.5-25, and 25-30. These test results are also included in the ungrouped dataset.
In addition, the dataset contains the same information for 3 common complications: Postpartum hemorrhage (PPH), gestational diabetes mellitus (GDM), and pre-eclampsia. The dataset of these pregnancies is brought at a 4-week resolution due to a lower number of measurements.
See Methods for more details on dataset curation.
Description of the data and file structure
Each individual test result was computed as a quantile score from a reference of a healthy, non-pregnant same-aged female population. We retained the participant’s age and BMI information if available (before pregnancy).
The queried results were binned into weekly intervals starting 60 weeks before delivery until 80 weeks postpartum.\
For the BMI-grouped dataset, the data is further binned (see above).
For each such bin with properties (test_name, week [bmi_group]), summary statistics were computed including mean, standard deviation, and the (5,10,25,50,75,90, and 95) percentiles.
The dataset consists of CSV files in the following structure:
- pregnancy.1w/ # Results at 1w resolution, no BMI groups
- 17_HYDROXY_PROGESTERONE.csv # Summary statistics on 17α-OHP
- ACTH_ADRENOCORTICOTROPIC_HORMONE.csv # Summary statistics on ACTH
- ⋮
- pregnancy.2w/ # Results at 2w resolution, no BMI groups
- ⋮
- pregnancy.1w.bmi/ # Results at 1w resolution with BMI groups
- ⋮
- pregnancy.gdm.4w/ # Pregnancies with a gestational diabetes diagnosis, 4-week resolution
- ⋮
- pregnancy.postpartum_hemorrhage.4w/ # Pregnancies with a postpartum hemorrhage diagnosis, 4-week resolution
- ⋮
- pregnancy.pre-eclampsia.4w/ # Pregnancies with a pre-eclampsia diagnosis, 4-week resolution
- ⋮
- stats.csv # Meta-statistics on each lab test, see below
- complications.csv # Number of pregnancies per complication
- excluded_icd9_codes.csv # ICD9 codes of all medical diagnoses excluded, "Chronic disease" as discussed in the methods.
- Metadata.csv # Table of units and test names
- LabNorm.csv # Reference values from a healthy, non-pregnant female population.
Lab test files
Each CSV file in the directories with the “pregnancy” prefix has the following structure:
week | val_n | val_mean | val_sd | val_5 | val_10 | val_25 | val_50 | val_75 | val_90 | val_95 | qval_mean | qval_sd | qval_5 | qval_10 | qval_25 | qval_50 | qval_75 | qval_90 | qval_95 | … | bmi_n | … |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
[-60,-56) | 4668 | 4.274635 | 0.356050 | 3.700000048 | 3.900000095 | 4.099999905 | 4.300000191 | 4.5 | 4.699999809 | 4.800000191 | 0.517748 | 0.300543 | 0.035977 | 0.081233 | 0.279392 | 0.554869 | 0.793191 | 0.924205 | 0.962102 | … | 1376 | … |
*First row from pregnancy.4w/Albumin.csv
Where:
- week: Week of the test relative to delivery (=0).
- val_n: Number of lab tests in the weekly interval.
- val_mean: The mean of all test values in the interval. Units are industry standard and be found in
Metadata.xlsx
- val_sd: Standard deviation of the test values in the interval.
- val_(5···95) - The (5···95)th percentiles of the test values in the interval, the 50th percentile being the median.
- qval columns: Same as the above for the quantile score of the test results in the interval.
- bmi_n: Number of test results that had a valid BMI measurement.
Other columns (not shown) are age and BMI columns with the same summary statistics.
Missing Values
Some files are missing some rows where weekly intervals have a single measurement (or None).
Where 1<n≤10, only mean values and median age (per gestational week) are recorded and all other values are replaced with N/A to address privacy concerns.
Full rows where n>10 are not identifiable at the individual level, protecting participants’ privacy. The same policy applies to the BMI columns.
Other files
The file stats.csv
provides some metadata statistics about the dataset:
gw_X
column names are theX
th percentile of delivery time, unit is gestational week.- Fraction of first pregnancies was calculated based on the records after 2010 only, since pregnancies before 2002 are not included and therefore we cannot know about earlier pregnancies.
- Preterm pregnancies are those with delivery at or before the 37th gestational week.
Metadata.csv
maps test names to the file names, and includes other metadata such as physiological system and unit of measurement.
LabNorm.csv
is stratified to ages and lab tests, each with a reference sampled at 13 points in the distribution. The reference is a healthy (see Methods for the definition of healthy), same-aged, and non-pregnant female population.
Sharing/Access information
Dataset and Juptyer notebooks for analysis are available on GitHub:
Pregnancy and postpartum dynamics revealed by millions of lab tests
- A frozen repository is available on Zenodo.
Data was curated using the following sources:
- Clalit Healthcare
- LabNorm R package
Study Population
The study population consisted of individuals from the Clalit healthcare database, Israel's largest health maintenance organization (HMO). We considered all pregnancies of females aged 20 to 35 between 2003 and 2020. Information about pregnancies before 2003 is not available. We estimated the fraction of first pregnancies for the years 2010-2020 to reduce the influence of first pregnancies before 2003 which we cannot account for. For more information, see “stats.csv”.
Data Collection
Medical records were pseudonymized by hashing of personal identifiers and randomization of dates by a random number of weeks uniformly sampled between 0 and 13 weeks for each patient and adding it to all dates in the patient diagnoses, laboratory, and medication records. This randomization does not affect timing relative to delivery.
We examined the timeframe of 60 weeks before delivery to 80 weeks after delivery for all documented labours within our study population. 0 is denoted as the week of delivery. We identified deliveries by ICD9 code V27 and confirmed a childbirth record for the individual. We excluded preterm deliveries (≤37 gestational weeks, ICD9 code 644) stillbirths, and labors with more than one newborn. Nonetheless, 12% of deliveries were at the ≤37 gestational weeks and missing the 644 code.
To mitigate ascertainment bias of the test results, for each test, we removed data from individuals with chronic disease that affected the test if the onset of the disease was up to 6 months after the test. We also removed data from individuals who purchased drugs that affected the tests in the 6 months before the tests. Chronic diseases are defined as non-pediatric ICD9 codes with a Kaplan−Meyer survival drop of >10% over 5 years and are assigned above a minimal average drop of 1/3 per y. Drugs that affect a test were defined as drugs with significant effect on the test (false discovery rate < 0.01). This step allowed us to focus on a relatively healthy subset of the pregnant population, reducing the confounding effects associated with specific health conditions listed above or medication usage.
To exclude the potential effect of follow-up pregnancies in the 80 weeks following delivery, we excluded lab values from individuals with another delivery within 40 weeks following the measurement.
For each pregnancy, we gathered all available test values including standard blood count, kidney and liver function tests, blood coagulation tests, lipid panel, inflammation markers, and hormones. We then discretized test values into time points relative to the time of birth in weekly intervals for each test. In addition to test values, we also extracted data on patients including age (at measurement, mean, and interquartile range) and BMI (the most proximal BMI measurement in medical records outside pregnancy, mean and interquartile range, if available).
Privacy concerns
Retrospective test results were aggregated and only statistical information was kept. Our ethical agreement with Clalit does not require informed consent for the publication of this aggregated data. Weekly intervals with a single measurement per test were removed. Mean values were kept for weekly intervals (per test) with 10 measurements or less and other values (percentiles, standard deviation) were removed, ensuring individual measurements cannot be interpreted from the aggregated data.