Early life proteomic and microbiome features signal obesity risk across 26 years of follow-up
Data files
May 27, 2026 version files 3.62 MB
-
abis_obesity_pub.csv
3.60 MB
-
README.md
23.83 KB
Abstract
Early-life drivers of obesity are incompletely understood. Childhood obesity is rising globally, and yet few studies have examined the microbiome and proteome in early childhood, in relation to this outcome, and most are cross-sectional by design. Early-life factors in the ABIS birth cohort (n = 16,683) were associated with obesity up to age 26 (mean follow-up 25.3 years, range 23.7-26.5 years): psychosocial stressors, smoking, infections, and diet in the first year of life. Biological markers including the metabolome (n = 290) and proteome (n = 358) at birth and gut microbial at age one (n = 1,743) were assessed. Significant differences were found in infants with future obesity, including elevated ANGPTL4, FST, and HGF (independently of maternal weight) and reduced isocaproic acid, tryptophan, and oleic acid, with prenatal mediation. Akkermansia, asaccharolytic bacteria (Phascolarctobacterium and Senegalimassiliensia), and equol-producers (Adlercreutzia and Slackia) were depleted at age one. Machine learning models selecting the 40 most predictive features showed long-term prediction from birth proteomics and bacterial taxa at age one (AUC = 0.83 ± 0.05, n = 1,877) and additional metrics e.g., parental and child BMI in the first eight years (AUC = 0.89 ± 0.02, n = 1,877), suggesting durable biological encoding. The processed data (biological and metadata) used in the published study include 16S rRNA relative abundances (bac_ variables) and normalized Olink protein NPX values (NPX_ variables), as well as responses from the ABIS questionnaires. No identifiable information is included. The outcome variable is coded as 0/1 in meta_ObesitasDiag_2024. Collectively, our findings suggest clinically relevant biomarkers pointing to early-life regulation of bile acid metabolism, lipid storage vs. oxidation, and immune–metabolic signaling and paths to prospectively prevent childhood and adult-onset obesity, even across a 26-year predictive gap.
Dataset DOI: 10.5061/dryad.2v6wwq042
Description of the data and file structure
Bacterial data were derived from 16S rRNA gene sequencing of stool samples collected from diapers at approximately one year of age. These data are presented as relative abundances and are labeled with the prefix bac_ followed by the bacterial species name, as assigned using the SILVA database (see manuscript for details).
Proteomic data were generated from cord blood samples using the Olink Explore 384 platform, as described in the manuscript. These values are reported as normalized protein expression (NPX) and are labeled with the prefix NPX_followed by the protein name.
All other variables are prefixed with meta_ and are derived from parental questionnaires administered at birth and at one year of age.
This dataset contains human subjects data that have been de-identified. In accordance with Dryad’s guidelines and best practices for human subjects research, no more than three indirect identifiers are included—and because the cohort consists of minors, this effectively limits the dataset to two indirect identifiers per individual. Variable selection was performed to remain within these constraints, and only the minimum necessary information is included to reduce re-identification risk and protect ABIS participants.
The obesity outcome, defined using ICD-10 codes as described in the manuscript, is provided in the final column and indicates whether the child had a corresponding diagnosis recorded in the national register by 2024.
Files and variables
File: abis_obesity_pub.csv
Description:
| Variable Name | Variable Label | Code |
| ObesityID | Masked ID of the ABIS subject | |
| bac_[Species] | Relative abundance of bacterium at ~ 1 year of age (16S) | |
| NPX_[Protein] | NPX of protein in cord blood sample | |
| meta_fr_10 | Did the mother have any infectious diseases during pregnancy? | 1= yes, 2= no, 3= do not know |
| meta_fr_12 | Week of delivery | |
| meta_fr_13 | The way of delivery | 1= normal, 2= cesarean, 3= other problem |
| meta_fr_14 | Birth weight | |
| meta_fr_14a | Birth length | |
| meta_fr_18 | Did you (the mother) smoke during your pregnancy? | 0= no, 1= yes |
| meta_fr_18a | If yes, how many cigarettes per day? | 1= 1-5, 2= 6-10, 3= 10-15, 4= 16-20, 5= 21-40, 5= more than 40 cigarettes/day |
| meta_educationma | Education of the mother - baseline | 1= low, 2= medium, 3= high |
| meta_educationpa | Education of the father- baseline | 1= low, 2= medium, 3= high |
| meta_fr_29 | Severe life event during pregnancy | 1= yes, 2= no |
| meta_b_3a | Infections during the first month | 1= yes, 2= no, 3= do not know |
| meta_b_4a | Cold (upper airway infection) during the first month | 1= yes, 2= no, 3= do not know |
| meta_b_4b | Otitis during the first month | 1= yes, 2= no, 3= do not know |
| meta_b_4d | Stomach flu during the first month | 1= yes, 2= no, 3= do not know |
| meta_b_5 | Only breastfeeding | corresponds to month, with 9= 9 months or older |
| meta_b_6 | Started with infant formula | corresponds to month, with 9= 9 months or older |
| meta_b_8 | When did the child have cow's milk for the first time? | corresponds to month, with 9= 9 months or older |
| meta_b_9 | When did the child have food containing gluten for the first time? | corresponds to month, with 9= 9 months or older |
| meta_b_10 | Breastfeeding ended | corresponds to month, with 9= 9 months or older |
| meta_b_30 | Has anyone smoked in the home? (in the first year) | 1 =yes, 2= no |
| meta_b_30a | Number of cigarettes | 1= sporadically, 2= 1-5, 3= 6-10, 4= 11-15, 5= 16-20, 6= 21-40, 7= >40 cigarettes/day |
| meta_b_42a | Common cold during the first year | 1= never, 2= 1-2 times, 3= 3-5 times, 4= 6 times or more |
| meta_b_42b | Stomach flu during the first year | 1= never, 2= 1-2 times, 3= 3-5 times, 4= 6 times or more |
| meta_b_42c | Infection requiring penicillin during the first year | 1= never, 2= 1-2 times, 3= 3-5 times, 4= 6 times or more |
| meta_b_42d | Influenza during the first year | 1= never, 2= 1-2 times, 3= 3-5 times, 4= 6 times or more |
| meta_b_42f | Eczema during the first year | 1= never, 2= 1-2 times, 3= 3-5 times, 4= 6 times or more |
| meta_b_44a | Difficult life event in the first year | 1= yes, 2= no, 3= do not know |
| meta_b_45 | Has the child had vitamin A- and D drops? | 1= yes, 2= no, 3= do not know |
| meta_b_53 | How often does the child eat fruits or berries? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_54 | How often does the child eat vegetables? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_55 | How often does the child eat potatoes or root vegetables? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_56 | How often does the child eat fish from a lake? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_57 | How often does the child eat fish from the Baltic sea? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_58 | How often does the child eat fish from other seas? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_59 | How often does the child eat eggs? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_60 | How often does the child eat meat from game (elk, roe deer)? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_61 | How often does the child eat beef? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_62 | How often does the child eat pork and sausage? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_63 | How often does the child eat mushrooms (except from cultured mushrooms)? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_64 | How often does the child eat heavy cream or creme fraiche? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_65 | How often does the child eat chocolate? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_66 | How often does the child eat other sweets? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_67 | How often does the child eat chips or cheesecakes? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_68 | How often does the child eat buns, cookies, cake? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_69 | How often does the child eat fried potatoes or french fries? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_72 | At what age (month) did your child have gruel for the first time? | corresponds to month, with 9= 9 months or older |
| meta_b_73 | How often does your child eat gruel nowadays? | 1= daily, 2= 3-5 times/week, 3= 1-2 times/week, 4= seldom |
| meta_b_76 | At what age (month) did your child have porridge for the first time? | corresponds to month, with 9= 9 months or older |
| meta_b_94 | How often does your child have ready-made drinks nowadays? | 1= never, 2= 1-3 times/month, 3= 1-3 times/week, 4= 4-6 times/week, 5= 7-10 times/week, 6= 10-14 times/week, 7= 15 times/week or more |
| meta_b_96 | At what age (month) did your child have milk for the first time? | corresponds to month, with 9= 9 months or older |
| meta_b_97 | How often does your child drink milk nowadays? | 1= never, 2= 1-3 times/week, 3= 4-6 times/week, 4= 1 time/ day, 5= 2 times/ day, 6= 3 times /day, 7= 4 times/day or more |
| meta_b_99 | How often does your child have yoghurt / sour milk nowadays? | 1= never, 2= 1-3 times/week, 3= 4-6 times/week, 4= 1 time/ day, 5= 2 times/ day, 6= 3 times /day, 7= 4 times/day or more |
| meta_sle_b | SLE Parent at Birth | 0= no, 1= yes |
| meta_sleC_1 | SLE Child at 1 year | 0= no, 1= yes |
| meta_bmiM_1 | BMI Mother at 1 year | |
| meta_bmiM_2 | BMI Mother at 2 years | |
| meta_bmiM_5 | BMI Mother at 5 years | |
| meta_bmiM_8 | BMI Mother at 8 years | |
| meta_bmiF_1 | BMI Father at 1 year | |
| meta_bmiF_2 | BMI Father at 2 years | |
| meta_bmiF_5 | BMI Father at 5 years | |
| meta_bmiF_8 | BMI Father at 8 years | |
| meta_bmiC_1 | BMI Child at 1 year | |
| meta_bmiC_2 | BMI Child at 2 years | |
| meta_bmiC_5 | BMI Child at 5 years | |
| meta_bmiC_8 | BMI Child at 8 years | |
| meta_Vulnerabilityindex4grade | Vulnerability index - 4 grade | |
| meta_parity | Parity | 0= first parity, 1= previous parity |
| meta_ROK0 | Smoking 3 months prior pregnancy | 1= do not smoke, 2= smoke 1-9 cigarettes/day, 3= smoke 10 or more/day |
| meta_ROK2 | Smoking during 30-32 pregnancy weeks | 1= do not smoke, 2= smoke 1-9 cigarettes/day, 3= smoke 10 or more/day |
| meta_SECAVSL | The birth ends with a caesarean section | 0= no, 1= yes |
| meta_SGAc | Size for gestational age (normal = 0; small or large = 1) | 0= normal, 1= small or large |
| meta_Gestational_Diabetes | Gestational diabetes (mother) | missing= no, 1= yes |
| meta_BMI_PrePregnancy | BMI of mother pre-pregnancy | |
| meta_BMI_End_of_Pregnancy | BMI of mother at end of the pregnancy | |
| meta_Overweight_Obesity_PrePreg | Weight classification of the mother pre-pregnancy (overweight) | |
| meta_Obesity_PrePreg | Weight classification of the mother pre-pregnancy (obesity) | |
| meta_Obese_End_Pregnancy | Weight classification of the mother at end of the pregnancy (obesity) | |
| meta_Childs_sex | Sex of the child | 0=male, 1= female |
| meta_ObesitasDiag_2024 | Obesity diagnosis in the child (ICD-10) by 2024 | 0= no, 1= yes |
Note: blank or empty cells are used to represent missing or absent data.
Code/software
The data are contained in a .csv file so no special software is needed. A corresponding Jupyter notebook has been posted to GitHub for the ML pipelines used in the paper, here: https://github.com/aahren/obesity-ML-ABIS.
Human subjects data
The ABIS study includes participant consent for data sharing and publication of de-identified data. For deposition in Dryad, all data underwent a formal de-identification process, including removal of direct identifiers and appropriate handling of indirect identifiers, in accordance with established data protection and privacy standards to minimize the risk of re-identification. As required by Dryad, no more than three indirect identifiers, which could lead to re-identification if combined with other available data, have been included. Data have been aggregated and variables combined, as appropriate. Masked IDs have been assigned for this specific obesity study, and only the information necessary for replication of analyses presented in the manuscript has been included.
