Data for: Gut microbial composition and diversity vary by CREBRF genotype among Samoan infants
Data files
Jun 02, 2025 version files 19.54 MB
-
README.md
12.76 KB
-
Supplementary_Figure_1.pdf
5.11 MB
-
Supplementary_Figure_2.pdf
6.65 MB
-
Supplementary_Figure_3.pdf
6.79 MB
-
Supplementary_Tables_pdf.zip
932.59 KB
-
Supplementary_Tables.zip
42.37 KB
Abstract
Over 40% of Samoans have at least one copy of the minor A allele at rs373863828 in CREB3 regulatory factor (CREBRF), which is associated with increased BMI but decreased odds of type 2 diabetes mellitus. The mechanisms underlying this paradoxical effect remain unknown. We hypothesized that gut microbiota may play a role and examined associations between CREBRF genotype and gut microbial diversity and composition among Samoan infants. Fecal samples were collected from Samoan infants aged 0 (n=23), 4 (n=20), and 21 (n=27) months. Microbiota community structure was analyzed using 16S rRNA bacterial gene sequencing. Both cross-sectional and longitudinal analyses revealed no associations between CREBRF genotype and overall microbiome composition or diversity at 0 or 4 months. Cross-sectional analysis at 21 months revealed a significant association between genotype and unweighted UniFrac distances (F1,24=1.855, R2=0.072, p=0.015). Longitudinal differential abundance analysis also revealed several differentially abundant taxa at 21 months. Notably, the AG genotype was associated with lower relative abundance of Escherichia Shigella (β=-6.741, SE=2.243, p=.004, q=.042). Significant genotype differences in gut microbiome composition and diversity at 21 months suggest that gut microbiota may be involved in relationships between CREBRF genotype and metabolic health. No genotype differences were observed at 0 or 4 months, suggesting that environmental and/or maternal variables have a greater influence on the gut microbiome in early infancy, and genotype effects emerge later. Further research should examine whether genotype differences in gut microbiota are associated with functional differences in metabolic or immune signaling pathways or energy extraction.
Dataset DOI: 10.5061/dryad.15dv41p7p
Description of the data
Fecal samples were collected from Samoan infants aged 0 (n=23), 4 (n=20), and 21 (n=27) months. Microbiota community structure was analyzed using 16S rRNA bacterial gene sequencing.
Supplementary Figures
Supplementary Figure 1
File: Supplementary_Figure_1.pdf
Description: Associations between covariates and alpha diversity metrics at age 0, 4, and 21 months. Student’s t-test results are shown for categorical variables. Pearson’s correlation coefficients are shown for continuous variables.
a. Mode of birth
i. Shannon’s entropy
ii. Faith’s phylogenetic diversity
b. Breastfeeding status (substituted with breastfeeding status at age 6 months for the 21-month timepoint)
i. Shannon’s entropy
ii. Faith’s phylogenetic diversity
c. Maternal BMI
i. Shannon’s entropy
ii. Faith’s phylogenetic diversity
d. Antibiotic use within 2 weeks prior to sample collection
i. Shannon’s entropy
ii. Faith’s phylogenetic diversity
Supplementary Figure 2
File: Supplementary_Figure_2.pdf
Description: Associations between covariates and nonmetric multidimensional scaling ordination of variation in bacterial community structure at age 0, 4, and 21 months.
a. Mode of birth
i. Unweighted UniFrac distances
ii. Weighted UniFrac distances
b. Breastfeeding status (substituted with breastfeeding status at age 6 months for the 21-month timepoint)
i. Unweighted UniFrac distances
ii. Weighted UniFrac distances
c. Maternal BMI
i. Unweighted UniFrac distances
ii. Weighted UniFrac distances
d. Antibiotic use within 2 weeks prior to sample collection
i. Unweighted UniFrac distances
ii. Weighted UniFrac distances
Supplementary Figure 3
File: Supplementary_Figure_3.pdf
Description: Comparison of relative abundance of bacterial taxa at 0, 4, and 21 months by CREBRF genotype. For clarity, only taxa with the relative abundance of >1% are included in each figure.
a. 0 months
i. Phylum level
ii. Family level
iii. Genus level
b. 4 months
i. Phylum level
ii. Family level
iii. Genus level
c. 21 months
i. Phylum level
ii. Family level
iii. Genus level
Supplementary Tables
Supplementary Table 1
File: Supplementary_Table_1a.csv
File: Supplementary_Table_1b.csv
File: Supplementary_Table_1c.csv
File: Supplementary_Table_1.pdf
Description: Results of age-stratified differential abundance analysis by infant age conducted in ANCOMBC2. Analyses were conducted on unrarefied data from samples with more than 3000 reads. Differences in relative abundance of bacterial taxa between (a) 0 vs 4 months, (b) 0 vs 21 months, and (c) 4 vs 21 months were investigated. Features with q-value < 0.25 that passed the sensitivity analysis in ANCOMBC2 are shown. LFC = log-fold change. SD = standard deviation. SE = standard error.
Supplementary Table 2
File: Supplementary_Table_2a.csv
File: Supplementary_Table_2b.csv
File: Supplementary_Table_2c.csv
File: Supplementary_Table_2.pdf
Description: Results of age-stratified differential abundance analysis by infant age conducted in MaAslin2. Analyses were conducted on data rarefied to 3000 reads. Differences in relative abundance of bacterial taxa between (a) 0 vs 4 months, (b) 0 vs 21 months, and (c) 4 vs 21 months were investigated. Features with q-value < 0.25 in MaAslin2 are shown. SD = standard deviation. SE = standard error.
Supplementary Table 3
File: Supplementary_Table_3a.csv
File: Supplementary_Table_3b.csv
File: Supplementary_Table_3c.csv
File: Supplementary_Table_3.pdf
Description: Results of longitudinal differential abundance analysis by infant age conducted in MaAslin2. Analyses were conducted on data rarefied to 3000 reads. Differences in relative abundance of bacterial taxa between (a) 0 vs 4 months, (b) 0 vs 21 months, and (c) 4 vs 21 months are shown. Features with q-value < 0.25 are shown. SD = standard deviation. SE = standard error.
Supplementary Table 4
File: Supplementary_Table_4a.csv
File: Supplementary_Table_4b.csv
File: Supplementary_Table_4c.csv
File: Supplementary_Table_4d.csv
File: Supplementary_Table_4.pdf
Description: Results of age-stratified differential abundance analysis by covariates conducted in ANCOMBC2. Analyses were conducted on unrarefied data from samples with more than 3000 reads. Differences in relative abundance of bacterial taxa by (a) antibiotic use within the 2 weeks prior to sample collection, (b) mode of delivery, (c) exclusive breastfeeding status, and (d) maternal BMI were investigated. Features with q-value < .25 that passed the sensitivity analysis for pseudo-count addition in ANCOMBC2 are shown. LFC = log-fold change. SD = standard deviation. SE = standard error.
Supplementary Table 5
File: Supplementary_Table_5a.csv
File: Supplementary_Table_5b.csv
File: Supplementary_Table_5c.csv
File: Supplementary_Table_5d.csv
File: Supplementary_Table_5.pdf
Description: Results of age-stratified differential abundance analysis by covariates conducted in MaAslin2. Analyses were conducted on data rarefied to 3000 reads. Associations in relative abundance of bacterial taxa and (a) antibiotic use within the 2 weeks prior to sample collection, (b) mode of delivery, (c) exclusive breastfeeding status, and (d) maternal BMI were investigated. Features with q-value < 0.25 in MaAslin2 are shown. SD = standard deviation. SE = standard error.
Supplementary Table 6
File: Supplementary_Table_6a.csv
File: Supplementary_Table_6b.csv
File: Supplementary_Table_6c.csv
File: Supplementary_Table_6d.csv
File: Supplementary_Table_6.pdf
Description: Results of longitudinal differential abundance analysis by covariates conducted using linear mixed-effects models in Maaslin2. Analyses were conducted on data rarefied to 3000 reads. Differences in relative abundance of bacterial taxa by (a) antibiotic use within the 2 weeks prior to sample collection, (b) mode of delivery, (c) feeding mode, and (d) maternal BMI were investigated. All models included the covariate of interest, age (factor variable), and their interaction as fixed effects and a random intercept for each infant. When the covariate alone was significant, age is listed as 0mo and relative abundances at 0 months are listed. When the interaction term was significant, the age of interaction and relative abundances at that age are listed. Features with q-value < 0.25 are shown. SD = standard deviation. SE = standard error.
Supplementary Table 7
File: Supplementary_Table_7.csv
File: Supplementary_Table_7.pdf
Description: Alpha-diversity multivariable regression results. Model 1 includes CREBRF genotype (ref = GG genotype) and all covariates. At 0 months, the covariates were mode of birth (Cesarean versus vaginal [ref]), breastfeeding status (exclusively breastfed versus mixed- for formula-fed [ref]), and maternal BMI (continuous). At 4 months, the covariates were mode of birth (Cesarean versus vaginal [ref], breastfeeding status (exclusively breastfed versus mixed- for formula-fed [ref]), and maternal BMI (continuous), and antibiotic use within the 2 weeks prior to sample collection (yes versus no [ref]). At 21 months, the covariates were mode of birth (Cesarean versus vaginal [ref]), exclusive breastfeeding status at age 6 months (exclusively breastfed versus mixed- for formula-fed [ref]), maternal BMI (continuous), and antibiotic use within the 2 weeks prior to sample collection (yes versus no [ref]). Model 2 only includes the CREBRF genotype. Model 3 only includes covariates. β (beta) coefficient, standard error (SE), and p-values are provided for each predictor variable.
Supplementary Table 8
File: Supplementary_Table_8.csv
File: Supplementary_Table_8.pdf
Description: Beta-diversity PERMANOVA test results. Model 1 includes CREBRF genotype (ref = GG genotype) and all covariates. At 0 months, the covariates were mode of birth (Cesarean versus vaginal [ref]), breastfeeding status (exclusively breastfed versus mixed- for formula-fed [ref]), and maternal BMI (continuous). At 4 months, the covariates were mode of birth (Cesarean versus vaginal [ref], breastfeeding status (exclusively breastfed versus mixed- for formula-fed [ref]), and maternal BMI (continuous), and antibiotic use within the 2 weeks prior to sample collection (yes versus no [ref]). At 21 months, the covariates were mode of birth (Cesarean versus vaginal [ref]), exclusive breastfeeding status at age 6 months (exclusively breastfed versus mixed- for formula-fed [ref]), maternal BMI (continuous), and antibiotic use within the 2 weeks prior to sample collection (yes versus no [ref]). Model 2 only includes CREBRF genotype. Model 3 only includes covariates. The sum of squares, R2, and p-values are provided for each predictor variable.
Supplementary Table 9
File: Supplementary_Table_9.csv
File: Supplementary_Table_9.pdf
Description: Results of age-stratified differential abundance analysis by CREBRF genotype conducted using ANCOMBC2. Analyses were conducted on unrarefied data from samples with more than 3000 reads. Features with q-value < 0.25 that passed the sensitivity analysis for pseudo-count addition in ANCOMBC2 are shown. SD = standard deviation. LFC = log-fold change. SE = standard error.
Supplementary Table 10
File: Supplementary_Table_10.csv
File: Supplementary_Table_10.pdf
Description: Results of longitudinal differential abundance analysis by CREBRF genotype conducted in Maaslin2. Analyses were conducted on data rarefied to 3000 reads. All models included genotype, age (factor variable), and their interaction as fixed effects and a random intercept for each infant. When genotype alone was significant, age is listed as 0mo and relative abundances at 0 months are listed. When the interaction term was significant, the age of interaction and relative abundances at that age are listed. Features with q-value < 0.25 are shown. SD = standard deviation. SE = standard error.
Supplementary Table 11
File: Supplementary_Table_11.csv
File: Supplementary_Table_11.pdf
Description: Significant associations between relative abundance of bacterial taxa identified and body size indices identified by ANCOMBC2 on unrarefied data. Associations between relative abundance of bacterial taxa and (a) height-for-age z-score, (b) weight-for-length z-score [substituted with weight-for-height z-score at 21 months], and total-body-less-head% body fat were investigated. Features with q-value <0.25 that passed the sensitivity analysis for pseudo-count addition in ANCOMBC2 are shown. There were no features associated with total-body-less-head% body fat with q-value <0.25. LFC = log-fold-change. SE = standard error.
Supplementary Table 12
File: Supplementary_Table_12.csv
File: Supplementary_Table_12.pdf
Description: Significant associations between relative abundance of bacterial taxa identified and body size indices identified by age-stratified analyses in MaAslin on data rarefied to 3000 reads. Associations between relative abundance of bacterial taxa and (a) height-for-age z-score, weight-for-length z-score [substituted with weight-for-height z-score at 21 months], and (b) total-body-less-head% body fat were investigated. Features with q-value <0.25 are shown. There were no features associated with weight-for-length z-score with q-value <0.25. SE = standard error.
Supplementary Table 13
File: Supplementary_Table_13.csv
File: Supplementary_Table_13.pdf
Description: Significant associations between relative abundance of bacterial taxa and body size indices identified by longitudinal analyses in MaAslin2. Associations between relative abundance of bacterial taxa and (a) height-for-age z-score and (b) weight-for-length z-score [substituted with weight-for-height z-score at 21 months], and total-body-less-head% body fat were investigated. Features with q-value <0.25 are shown. There were no features associated with total-body-less-head% body fat with q-value <0.25. SE = standard error.
