An evolution-based framework for describing human gut bacteria

Doran, Benjamin 1 ; Chen, Robert2; Giba, Hannah1; Behera, Vivek1; Barat, Bidisha1; Sundararajan, Anitha1; Lin, Huaiying1; Sidebottom, Ashley1; Pamer, Eric1; Raman, Arjun1

Research facility: Duchossois Family Institute, University of Chicago

Published Dec 25, 2023 on Dryad. https://doi.org/10.5061/dryad.cjsxksnd0

Data files

Dec 25, 2023 version files 1.07 GB

Data_X1.xlsx

92.52 MB
README.md

1.59 KB
Table_S3.csv

975.18 MB

Abstract

The human gut microbiome contains many bacterial strains of the same species (‘strain-level variants’). Describing strains in a biologically meaningful manner rather than purely taxonomic objects is an important goal but challenging due to the complexity of strain-level variation. Here, we measured patterns of co-evolution across >7,000 strains spanning the bacterial tree-of-life. Using these patterns as a prior for studying hundreds of gut commensal strains that we isolated, sequenced, and metabolically profiled revealed widespread structure beneath the phylogenetic level of species. Defining strains by their co-evolutionary signatures enabled predicting their metabolic phenotypes and engineering consortia from strain genome content alone. Our findings demonstrate a biologically relevant organization to strain-level variation and motivate a new schema for describing bacterial strains based on their evolutionary history.

Materials and Methods (futher details and metadata in associated article's supplementary material) Creating a bank of commensal human gut microbiome strains Fecal samples were obtained from 28 human donors that fell within the age range of 18 to 63 with a median age of 35. Donors were selected as those with no antibiotic use in the past year, no known history of diabetes, colitis, autoimmune disease, cancer, pneumonia, dysentery, or cellulitis at time of consent. Institutions that approved protocols of fecal sample collection were Memorial Sloan Kettering (MSK) and the University of Chicago. Fresh fecal samples were immediately reduced in an anaerobic chamber upon collection and diluted and cultured on various growth media. Agar media types vary, but include any of following: Columbia Blood Agar, Brain Heart Infusion +Yeast, Brain Heart Infusion + Mucin, Brain Heart Infusion + Yeast + Acetate or N-Acetylglucosamine, reinforced Clostridial Agar, Peptone Yeast Glucose, Yeast Casitone Fatty Acids, Defined media M5. Colonies were selected and grown to be sufficiently turbid, 20% glycerol/PBS stocks were created and stored in a -80C freezer.Colonies were selected for whole-genome based on pyro-sequencing of the 16S region which provides a rough estimate of genus level designation. For each donor, only colonies that had a sequence identity threshold of less than 99% from CD-Hit (v. 4.8.1) were selected for whole-genome sequencing (1). Bacterial genomic DNA was extracted using QIAamp DNA Mini Kit (QIAGEN) according to manufacturer’s manual. The purified DNA was quantified using a Qubit 2.0 fluorometer. 1000ng of each sample was prepared for sequencing using the QIAseq FX DNA Library Kit (QIAGEN). The protocol was carried out for a targeted fragment size of 550bp. Sequencing was performed on the MiSeq or NextSeq platform (Illumina) with a paired- end (PE) kit in pools designed to provide 1-3 million PE reads per sample with read length of 250 or 150 bp. Adapters were trimmed off with Trimmomatic with following parameters: the leading and trailing 3 bp of the sequences were trimmed off, quality was controlled by a sliding window of 4, with an average quality score of 15 (default parameters of Trimmomatic). Moreover, any read that was less than 50 bp long after trimming and quality control were discarded. The remaining high-quality reads were assembled into contigs using SPAdes (v3.14.0)(2).Taxonomic classification of the assembled contigs was performed with the following methods: (a) Kraken2 (v2.1.1); (b) full/partial length 16S rRNA gene from each isolated colony’s assembled contigs is extracted and input into BLASTn (v2.10.1+) to query against NCBI’s RNA RefSeq database (3, 4). Top five hits for each query are manually curated to determine an isolate’s identity, with identity and coverage cutoff both at 95%; (c) GTDB-Tk (v1.5.1) (5). Final taxonomy is determined by the consensus of the three methods. Any colony that did not match initial pyro-sequencing taxonomy or lacked consensus are excluded from the commensal strain bank.

Annotating each strain in the commensal strain bank by its orthologous gene group (OGG) content For individual isolates, the genome assemblies were annotated using Prokka (v1.12) producing a fasta file of all coding regions from the assembled genome translated to the amino- acid protein sequences (10). This fasta file was then input to eggNOG mapper (v2.0.1b) to annotate each protein sequence against the eggNOG database (v5.0) of orthologous gene- groups (OGGs) at the level of Bacteria (‘@2’) (11, 12). Each isolate was then aligned based on this common set of OGG features, where isolates correspond to each row and OGGs correspond to each column and each entry holds the number of protein sequences that matched to that OGG. This OGG alignment of 669 isolates forms the CSB OGG matrix (Fig. 1C). Each isolate was annotated across 11,248 total OGGs of which 5,449 OGGs have greater than zero variance; annotations of isolated were done with 16S-BLASTn, GTDB, and final NCBI taxonomic designations at the level of Phylum through Species.

An evolution-based framework for describing human gut bacteria

Data files

Abstract

Description of the data and file structure

Code/Software

An evolution-based framework for describing human gut bacteria

Data files

Abstract

README: An evolution-based framework for describing human gut bacteria

Description of the data and file structure

Code/Software

Methods