Data for: A modified Michaelis-Menten equation estimates growth from birth to 3 years in healthy babies in the US
Data files
Sep 21, 2023 version files 6.31 MB
-
README.md
2.36 KB
-
starr_source_height_data_jittered.txt
2.74 MB
-
starr_source_weight_data_jittered.txt
3.57 MB
Jan 16, 2024 version files 6.31 MB
-
README.md
2.28 KB
-
starr_source_height_data_jittered.txt
2.74 MB
-
starr_source_weight_data_jittered.txt
3.57 MB
Jan 22, 2024 version files 6.31 MB
Abstract
Background: Standard pediatric growth curves cannot be used to impute missing height or weight measurements in individual children. The Michaelis-Menten equation, used for characterizing substrate-enzyme saturation curves, has been shown to model growth in many organisms including nonhuman vertebrates. We investigated whether this equation could be used to interpolate missing growth data in children in the first three years of life and compared this interpolation to several common interpolation methods and pediatric growth models.
Methods: We developed a modified Michaelis-Menten equation and compared expected to actual growth, first in a local birth cohort (N=97) and then in a large, outpatient, pediatric sample (N=14,695).
Results: The modified Michaelis-Menten equation showed excellent fit for both infant weight (median RMSE: boys: 0.22kg [IQR:0.19; 90%<0.43]; girls: 0.20kg [IQR:0.17; 90%<0.39]) and height (median RMSE: boys: 0.93cm [IQR:0.53; 90%<1.0]; girls: 0.91cm [IQR:0.50;90%<1.0]). Growth data were modeled accurately with as few as four values from routine well-baby visits in year 1 and seven values in years 1-3; birth weight or length was essential for best fit. Interpolation with this equation had comparable (for weight) or lower (for height) mean RMSE compared to the best-performing alternative models.
Conclusions: A modified Michaelis-Menten equation accurately describes growth in healthy babies aged 0–36 months, allowing interpolation of missing weight and height values in individual longitudinal measurement series. The growth pattern in healthy babies in resource-rich environments mirrors an enzymatic saturation curve.
README: Data for: A modified Michaelis-Menten equation estimates growth from birth to 3 years in healthy babies in the US
https://doi.org/10.5061/dryad.4j0zpc8jf
Description of the data and file structure
Data for this study include, per baby: sex, age in days, and, over time, weight in Kg and height in cm. Each baby had at least 5 visits. Our goal was to fit each baby’s data to a curve as described by a modified Michaelis-Menten equation, allowing interpolation of missing weight or height values. Among the subset of all infants who had 7 well-baby visits in the first year of life, and 12 visits over 3 years, we further explored the minimum number of, and which, data points were necessary for good fit. Finally, among babies with 5 time points in year 1, and 2 in both year 2 and year 3, we examined whether weight or height data early in life could predict growth in later months.
To meet anonymization guidelines, we are providing only STARR data, including sex, age and jittered weight and height (for STORK data, STARR race/ethnicity information and STARR exact data, please contact us directly). We used half of the RMSE sd as a range for the jitter (+/- 0.075 kg for weight and +/- 0.2 cm for height). Attached files include:
(1) STARR weight data (starr_source_weight_data.txt): anon_id, sex (male or female), the baby's age in days (AgeBabyDays), jittered weight in kg (jit_weightkg);
(2) STARR height data (starr_source_height_data.txt): anon_id, sex, AgeBabyDays, jittered height in cm (jit_heightcm).
Sharing/Access information
All data (on Dryad) and code (via Zenodo: https://doi.org/10.5281/zenodo.10537088) are included here.
Code/Software
The R code for fitting weight and/or height data with the MME equation is shown in the MME_growth_fitting.RMD file. The tab-delimited and anonymized source data for weights and heights (both jittered) are posted. These can be used with the R code-but the user will need to correct input and output filepaths used in the script. The HTML version of these files is available as well, in case viewing the scripts without opening them in R is desired.
The add_holdout script is a helper script for creating a holdout column used by the hbgd_holdout_tests, sitar_holdout_test, and locf_holdout_tests scripts. The simple_linear_model and MME_holdout_tests script have the same functionality built into the script.
R_sessionInfo.txt contains the R software version, as well as the versions of the packages included in the code.
See the methods section for the description of the starting parameters for the nls() function.
Methods
Sources of data: Information on infants was ascertained from two sources: the STORK birth cohort and the STARR research registry. (1) Detailed methods for the STORK birth cohort have been described previously. In brief, a multiethnic cohort of mothers and babies was followed from the second trimester of pregnancy to the babies’ third birthday. Healthy women aged 18–42 years with a single-fetus pregnancy were enrolled. Households were visited every four months until the baby’s third birthday (nine baby visits), with the weight of the baby at each visit recorded in pounds. Medical charts were abstracted for birth weight and length. (2) STARR (starr.stanford.edu) contains electronic medical record information from all pediatric and adult patients seen at Stanford Health Care (Stanford, CA). STARR staff provided anonymized information (weight, height and age in days for each visit through age three years; sex; race/ethnicity) for all babies during the period 03/2013–01/2022 followed from birth to at least 36 months of age with at least five well-baby care visits over the first year of life.
Inclusion of data for modeling: All observed weight and height values were evaluated in kilograms (kg) and centimeters (cm), respectively. Any values assessed beyond 1,125 days (roughly 36 months) and values for height and weight deemed implausible by at least two reviewers (e.g., significant losses in height, or marked outliers for weight and height) were excluded from the analysis. Additionally, weights assessed between birth and 19 days were excluded. At least five observations across the 36-month period were required: babies with fewer than five weight or height values after the previous criteria were excluded from analyses.
Model: We developed our weight model using values from STORK babies and then replicated it with values from the STARR babies. Height models were evaluated in STARR babies only because STORK data on height were scant. The Michaelis-Menten equation is described as follows: v = Vmax ([S]/(Km + [S]) , where v is the rate of product formation, Vmax is the maximum rate of the system, [S] is the substrate concentration, and Km is a constant based upon the enzyme’s affinity for the particular substrate. For this study the equation became: P = a1 (Age/(b1+ Age)) + c1, where P was the predicted value of weight (kg) or height (cm), Age was the age of the infant in days, and c1 was an additional constant over the original Michaelis-Menten equation that accounted for the infant’s non-zero weight or length at birth. Each of the parameters a1, b1 and c1 was unique to each child and was calculated using the nonlinear least squares (nls) method. In our case, weight data were fitted to a model using the statistical language R, by calling the formula nls() with the following parameters: fitted_model <-nls(weights~(c1+(a1*ages)/(b1+ages)), start = list(a1 = 5, b1 = 20, c1=2.5)), where weights and ages were vectors of each subject’s weight in kg and age in days. The default Gauss-Newton algorithm was used. The optimization objective is not convex in the parameters and can suffer from local optima and boundary conditions. In such cases good starting values are essential: the starting parameter values (a1=5, b1=20, c1=2.5) were adjusted manually using the STORK dataset to minimize model failures; these tended to occur when the parameter values, particularly a1 and b1, increased without bound during the iterative steps required to optimize the model. These same parameter values were used for the larger STARR dataset. The starting height parameter values for height modeling were higher than those for weight modeling, due to the different units involved (cm vs. kg) (a1=60, b1=530, c1=50). Because this was a non-linear model, goodness of fit was assessed primarily via root mean squared error (RMSE) for both weight and height.
Imputation tests: To test for the influence of specific time points on the models, we limited our analysis to STARR babies with all recommended well-baby visits (12 over three years). Each scheduled visit except day 1 occurred in a time window around the expected well-baby visit (Visit1: Day 1, Visit2: days 20–44, Visit3: 46–90, Visit4: 95–148, Visit5: 158–225, Visit6: 250–298, Visit7: 310–399, Visit8: 410–490, Visit9: 500–600, Visit10: 640–800, Visit11: 842–982, Visit12: 1024–1125). We considered two different sets: infants with all scheduled visits in the first year of life (seven total visits) and those with all scheduled visits over the full three-year timeframe (12 total visits). We fit these two sets to the model, identifying baseline RMSE. Then, every visit, and every combination of two to five visits were dropped, so that the RMSE or model failures for a combination of visits could be compared to baseline.
Prediction: We sought to predict weight or height at 36 months (Y3) from growth measures assessed only up to 12 months (Y1) or to 24 months (Y1+Y2), utilizing the “last value” approach. In brief, the last observation for each child (here, growth measures at 36 months) is used to assess overall model fit, by focusing on how accurately the model can extrapolate the measure at this time point. We identified all STARR infants with at least five time points in Y1 and at least two time points in both Y2 and Y3, with the selection of these time points based on maximizing the number of later time points within the constraints of the well-baby visit schedule for Y2 and Y3. The per-subject set of time points (Y1-Y3) was fitted using the modified Michaelis-Menten equation and the mean squared error was calculated, acting as the “baseline” error. The model was then run on the subset of Y1 only and of Y1+Y2 only. To test predictive accuracy of these subsets, the RMSE was calculated using the actual weights or heights versus the predicted weights or heights of the three time series.
Comparison with other models: We examined how well the modified Michaelis-Menten equation performed interpolation in STARR babies compared to ten other commonly used interpolation methods and pediatric growth models including: (1) the ‘last observation carried forward’ model; (2) the linear model; (3) the robust linear model (RLM method, base R MASS package); (4) the Laird and Ware linear model (LWMOD method); (5) the generalized additive model (GAM method); (6) locally estimated scatterplot smoothing (LOESS method, base R stats package); (7) the smooth spline model (smooth.spline method, base R stats package); (8) the multilevel spline model (Wand method); (9) the SITAR (superimposition by translation and rotation) model and (10) fast covariance estimation (FACE method).
Model fit used the holdout approach: a single datapoint (other than birth weight or birth length) was randomly removed from each subject, and the RMSE of the removed datapoint was calculated as the model fitted to the remaining data.
The hbgd package was used to fit all models except the ‘last observation carried forward’ model, the linear model and the SITAR model. For the ‘last observation carried forward’ model, the holdout data point was interpolated by the last observation by converting the random holdout value to NA and then using the function na.locf() from the zoo R package. For the simple linear model, the holdout-filtered data were used to determine the slope and intercept via R’s lm() function, which were then used to calculate the holdout value. For the SITAR model, each subject was fitted by calling the sitar() function with df=2 to minimize failures, and the RMSE of the random holdout point was subsequently calculated with the predict() function. For this analysis, set.seed(1234) was used to initialize the pseudorandom generator.
Usage notes
Example R code for fitting weight and/or height data with the MME equation is shown in the MME_growth_fitting.RMD file.
This file was written to fit the supplied STARR dataset. However, it can be adapted to alternative data. The HTML version of this file is available as well, in case viewing the script without opening it in R is desired. R_sessionInfo.txt contains the R software version, as well as the versions of the packages included in the code. See the methods section of the manuscript for the description of the starting parameters for the nls() function.