Data from: Cryptic diversity of cellulose-degrading gut bacteria in industrialized humans
Data files
Feb 12, 2024 version files 7.41 MB
-
Data_S1_Figure_1B_tree.nwk
-
Data_S10_Figure_5A_.txt
-
Data_S11_Figure_5B.xlsx
-
Data_S12_Figure_5C.txt
-
Data_S13_Figure_5D.txt
-
Data_S14_Figure_5E_middle_panel.txt
-
Data_S15_Figure_5E_right_panel.txt
-
Data_S16_Figure_5E_verticality_values.xlsx
-
Data_S17_Supplementary_Figure_S1.xlsx
-
Data_S18_Supplementary_Figure_S2A_.xlsx
-
Data_S19_Supplementary_Figure_S2B.xlsx
-
Data_S2_Figure_1C_data.xlsx
-
Data_S20_Supplementary_Figure_S2C.xlsx
-
Data_S21_Supplementary_Figure_S3.xlsx
-
Data_S22_Supplementary_Figure_S4.xlsx
-
Data_S23_Supplementary_Figure_S5.xlsx
-
Data_S24_Supplementary_Figure_S6.xlsx
-
Data_S25_Supplementary_Figure_S7.xlsx
-
Data_S26_Supplementary_Figure_S8.xlsx
-
Data_S27_Supplementary_Figure_S9.xlsx
-
Data_S28_Supplementary_Figure_S10_tree.nwk
-
Data_S29_Supplementary_Figure_S11.xlsx
-
Data_S3_Figure_2A.xlsx
-
Data_S30_Supplementary_Figure_S12.xlsx
-
Data_S31_Supplementary_Figure_S13.xlsx
-
Data_S32_197_phylogenetic_trees.pdf
-
Data_S4_Figure_2Ci.xlsx
-
Data_S5_Figure_2Cii.xlsx
-
Data_S6_Figure_3A_host_tree.nwk
-
Data_S7_Figure_3A_tree.nwk
-
Data_S8_Figure_3B_tree.nwk
-
Data_S9_Figure_4C.xlsx
-
README.md
Abstract
Humans, like all mammals, depend on the gut microbiome for digestion of cellulose, the main component of plant fiber, but evidence for cellulose fermentation in the human gut is scarce. We have identified ruminococcal species in the gut microbiota of human populations that assemble functional multi-enzymatic cellulosome systems capable of degrading plant cell wall polysaccharides. One of these species, which is strongly associated with humans, likely originated in the ruminant gut and was subsequently transferred to the human gut potentially during domestication, where it underwent diversification and diet-related adaptation through the acquisition of genes from other gut microbes. Collectively, these species are abundant and widespread among ancient humans, hunter-gatherers, and rural populations, but are extremely rare in populations from industrialized societies, suggesting potential disappearance in response to the westernized lifestyle.
README: Cryptic diversity of cellulose-degrading gut bacteria in industrialized humans
https://doi.org/10.5061/dryad.z08kprrkj
Data sources files for all figures and supplementary material
Description of the data and file structure
Data S1 Figure 1B tree.nwk is an unrooted phylogenetic tree, computed with the maximum likelihood method, of 62 selected genomes and MAGs, using the sequence of the ScaC scaffoldin as a phylotyping marker. The file contains the newick format used to create phylogenetic tree.
Data S2 Figure 1C data.xlsx Genomic dissimilarity computed by Mash distance within the novel identified ruminococcal cellulosomal species and pairwise comparisons to each other as well as to the ruminal R. flavefaciens species and the human species, R. champanellensis. Each column represents the mash distances.
Data S3 Figure 2A.xlsx, Observed collective prevalence of the MAGs for fiber-degrading strains in various human, apes and NHP cohorts. Values in columns represents the number of positive individuals, and percentages are also given for each host category.
Data S4 Figure 2Ci.xlsx Distribution of each human cellulosomal strain (R. champanellensis, R. hominiciens, R. ruminiciens and R. primaciens) across the sample cohorts. For each genome (column A), samples (column B) and host (column C), average fold and number of strains are given.
Data S5 Figure 2Cii.xlsx, Distribution of the human cellulosomal strains among the human- and NHP-positive samples. 1st sheet for each genome and host, the number of samples is given (column C) as well as the percentage of positive samples (column D), the 2nd sheet includes the siginificant indval statistics for each genome group.
Data S6 Figure 3A host tree.nwk is the newick format used to create phylogenetic host tree in Figure 3A, a phylogenetic tree of the mammalian host species.
Data S7 Figure 3A tree.nwk is the newick format used to create phylogenetic tree in Figure 3A, a core protein phylogenetic tree illustrating the co-speciation hypothesis.
Data S8 Figure 3B tree.nwk is the newick format used to create phylogenetic tree in Figure 3B, a phylogenetic tree of 197 concatenated core proteins.
Data S9 Figure 4C.xlsx concentration of reducing sugars is given for the two enzymes at the different time points. Comparative cellulolytic activity of ruminococcal GH5 orthologs of either human (R. primaciens) or rumen origin (R. flavefaciens FD-1). Enzyme samples were examined using microcrystalline cellulose (Avicel) as the substrate at 37°C.
Data S10 Figure 5A presence/absence (0 or 1) is giving for each MAG and gene clusters (column 1). Analysis (PCA) of the overall predicted ORFs of the MAGs.
Data S11 Figure 5B.xlsx column A verticality values for common genes, column B verticality values for specific genes. Rank distribution of verticality values for core proteins across the three host types versus host-specific proteins indicates that specific genes are likely to be transferred via horizontal gene transfer within a given type of host.
Data S12 Figure 5C in each column number of GH genes for each specific MAG. Analysis of the fibrolytic system [indicating glycoside hydrolase (GH) families] of the MAGs, according to their hosts.
Data S13 Figure 5D transcripts expression of fibrolytic genes for the 3 hosts, 3 individual each. Analysis of the expression of the fibrolytic system, as examined by transcriptomic analysis of three fecal samples of the three hosts (macaque, human and sheep rumen).
Data S14 Figure 5E middle panel number of specific genes copies for each specific MAG. The statistically significant GH families that statistically distinguish the strains associated with the three gut ecosystems as determined by the Kruskal-Wallis test p* *<0.05 after FDR correction.
Data S15 Figure 5E right panel transcripts expression of specific genes for the 3 hosts, 3 individual each. Statistically significant GH expression (metatranscripts in FPKM) between the three types of hosts.
Data S16 Figure 5E verticality values.xlsx verticality values are given for the indicated genes. Verticality values for each of statistically significant GH families.
Data S17 Supplementary Figure S1.xlsx Prevalence of the fibrolytic strains in 1989 gut metagenomic samples. Sheet 1 presence is given as 1 in column C for a specific genome in column B, host group (column D) and host animal (column E) and run (column A). Average fold are given in column E, lifestyle in G, country in H, number of strains in the samples in I, and host category in J. Sheet 2 is the raw data, sheet 3 is same as sheet 2 but a filtered coverage at 20% for the covered percent column (column T)
Data S18 Supplementary Figure S2A Prevalence and abundance of the fibrolytic strains in various human and NHPs gut samples. Prevalence of the cellulosomal strains (R. hominiciens, R. ruminiciens, R. primaciens and R. flavefaciens) is given for various subsampling read depths (5, 10, 20, 40 and 60 M). The maximal numbers of individuals in the cohorts are given for each host category at 5 M read depths. Sheet 1 prevalences are given in column D for a specific genome (column A) readsubsampling (column B), host categories are given in column C. Sheet 2 statistics Chi-square test , including pvalue, significance and adj. pvalue are given between the specific host categories and genomes. In sheet 4 additional metadata for the samples with a coverage above 20 % are given including the original study in column A.
Data S19 Supplementary Figure S2B.xlsx Prevalence of the cellulosomal strains for each host category at 10 M read depths. Sheet 1 prevalences are given in column D for a specific genome (column A) readsubsampling (column B), host categories are given in column C. Additional informations include country of origin (column E), host (column F), lifestyle (column G), host catergory (column H), depth (column I), host species (column J), host group (column K), additional host category (column L). Sheet 2 statistics Chi-square test , including pvalue, significance and adj. pvalue are given between the specific host categories and genomes.
Data S20 Supplementary Figure S2C.xlsx Abundance of the cellulosomal strains (R. hominiciens, R. ruminiciens, R. primaciens and R. flavefaciens) for each category at 20 M read depths. Sheet 1 abundances are given in column D for a specific genome (column A) readsubsampling (column B), samples are given in column C. Sheet 2 statistics Chi-square test , including pvalue, significance and adj. pvalue are given between the specific host categories and genomes.
Data S21 Supplementary Figure S3.xlsx Prevalence of the fibrolytic strains in NHP gut samples. Prevalence of the cellulosomal strains (R. hominiciens, R. ruminiciens, R. primaciens and R. flavefaciens) is given for various subsampling read depths (5, 10, 20, 40 and 60 M). Sheet 1 prevalences are given in column D for a specific genome (column A) readsubsampling (column B), lsamples hosts are given in column C. Sheet 2 statistics Chi-square test , including pvalue, significance and adj. pvalue are given between the specific hosts and genomes.
Data S22 Supplementary Figure S4.xlsx : Prevalence of the fibrolytic strains in industrialized countries. Prevalence of the cellulosomal strains (R. hominiciens, R. ruminiciens, R. primaciens and R. flavefaciens) is given for 10 M read depths. Sheet 1 prevalences are given in column D for a specific genome (column A) readsubsampling (column B), locations of the samples are given in column C. Sheet 2 statistics Chi-square test , including pvalue, significance and adj. pvalue are given between the specific locations and genomes.
Data S23 Supplementary Figure S5.xlsx Prevalence of the fibrolytic strains in rural societies countries. Prevalence of the cellulosomal strains (R. hominiciens, R. ruminiciens, R. primaciens and R. flavefaciens) is given for 10 M read depths. Sheet 1 prevalences are given in column D for a specific genome (column A) readsubsampling (column B), locations of the samples are given in column C. Sheet 2 statistics Chi-square test , including pvalue, significance and adj. pvalue are given between the specific locations and genomes.
Data S24 Supplementary Figure S6.xlsx Abundance of the MAGs in their ecosystem. Abundance of each MAG in each sample is given
Data S25 Supplementary Figure S7.xlsx Prevalence of the fibrolytic strains in captive and wild NHPs. Prevalence of the cellulosomal strains (R. hominiciens, R. ruminiciens, R. primaciens and R. flavefaciens) is given for 10 M read depths in various animals (apes: chimpanzees, gorillas, and orangutang, other NHPs: macaques, tamarins, baboons, mandrills, capuchins, colobus monkeys, guerezas and geladas, and ruminants: cows, sheep, camels, yaks and deer). Sheet 1 number of positive (column E) and negative samples (column F) out of total number of samples examined (column D) is given for each host group (column A) and genomes (column B), lifestyle appears in column C. Sheet 2 statistics Chi-square test , including pvalue, significance and adj. pvalue are given for the specific host groups and genomes.
Data S26 Supplementary Figure S8.xlsx Prevalence of the fibrolytic strains in omnivore and folivore NHPs. Prevalence of the cellulosomal strains (R. hominiciens, R. ruminiciens, R. primaciens and R. flavefaciens) is given for 10 M read depths in various NHPs (not including apes that are all omnivores). Sheet 1 prevalence (column D) of a specific genome (column A) is given, column B is the reading depth and column C the diet of the animal. Sheet 2 statistics Pearson's Chi-squared test with Yates' continuity correction and p value are given for the 3 genomes examined
Data S27 Supplementary Figure S9.xlsx Identification of core proteins. Presence/absence (0 or 1) is giving for each MAG and gene clusters (column 1)
Data S28 Supplementary Figure S10 tree.nwk Multilocus sequence analysis of the 30 MAGs. The file contains the newick format used to create phylogenetic tree in Figure S11
Data S29 Supplementary Figure S11.xlsx Comparative cellulolytic activity of ruminococcal GH5 orthologs of either human (R. primaciens) or rumen origin (R. flavefaciens FD-1). Enzyme samples were examined at various concentrations using amorphous cellulose as the substrate at 37°C for 1h incubation. Concentration of reducing sugars is given for the two enzymes at the different enzyme concentrations
Data S30 Supplementary Figure S12.xlsx Transcriptomic analysis of R. flavefaciens, R. hominiciens and R. primaciens strains. The total gene expression in percentage of the specific MAGs in the three fecal samples of the three hosts (macaque, human and sheep rumen) is in sheet 1 overall transcripts for the 3 hosts, 3 individuals each. In sheet 2, 3 and 4 the number of transcripts (in FPKM) of each of the indicated cellulosomal genes in three fecal samples from either sheep, human or macaque host. Sheet 2, transcripts for the 3 sheep individuals for each gene, sheet 3, transcripts for the 3 human individuals for each gene, sheet 4, transcripts for the 3 macaque individuals for each gene/
Data S31 Supplementary Figure S13.xlsx Fibrolytic cellulosomal core enzymes of the 30 MAGs with respect to their sample of origin (human-, rumen- and NHPs-assembled MAGs). In each column number of GH genes for each specific MAG
Data S32 197 phylogenetic trees.pdf are the compilations of the 197 phylogenetic tree used for evolutionary analysis