Data from: Quantifying (non)parallelism of microbial community change using multivariate vector analysis
Data files
Jan 01, 2023 version files 40.89 MB
Abstract
Parallel evolution of phenotypic traits is regarded as strong evidence for natural selection and has been studied extensively in a variety of taxa. However, we have limited knowledge of whether parallel evolution of host organisms is accompanied by parallel changes of their associated microbial communities (i.e., microbiotas), which are crucial for their hosts’ ecology and evolution. Determining the extent of microbiota parallelism in nature can improve our ability to identify the factors that are associated with (putatively adaptive) shifts in microbial communities. While it has been emphasized that (non)parallel evolution is better considered as a quantitative continuum rather than a binary phenomenon, quantitative approaches have rarely been used to study microbiota parallelism. We advocate using multivariate vector analysis (i.e., phenotypic change vector analysis) to quantify direction and magnitude of microbiota changes and discuss the applicability of this approach for studying parallelism. We exemplify its use by reanalyzing gut microbiota data from multiple fish species that exhibit parallel shifts in trophic ecology. This approach provides an analytical framework for quantitative comparisons across host lineages, thereby providing the potential to advance our capacity to predict microbiota changes. Hence, we encourage the development and application of quantitative measures, such as multivariate vector analysis, to better understand the role of microbiota dynamics during their hosts’ adaptive evolution, particularly in settings of parallel evolution.
Methods
We obtained 16S rRNA gene sequencing data from six published studies on teleost fishes, in which there are replicate niche shifts. For some of the study systems, we only analyzed a subset of the original dataset. Details about the populations/species analyzed from each study can be found in Table S1 of the paper. All sequencing data were downloaded from the NCBI Sequence Read Archive (SRA). For further information on the sequencing platforms, sample sizes and accession numbers for these published datasets, see Tables S1 and S2 of the paper. Data was converted from SRA to FASTQ format using the fastq-dump function of the SRA Toolkit v2.9.6-1 (https://github.com/ncbi/sra-tools).
We analyzed only forward reads truncated to different lengths, depending on read length and sequence quality. Reads were imported into the open-source bioinformatics pipeline Quantitative Insights Into Microbial Ecology (QIIME2; Bolyen et al. 2019) to analyze gut microbial communities. We performed sequence quality control with the QIIME2 plugin DADA2 (Callahan et al. 2016). A bacterial phylogenetic tree was produced with FastTree 2.1.3 (Price et al. 2010). We calculated different phylogenetic (weighted and unweighted UniFrac) and non-phylogenetic (Bray-Curtis dissimilarity) metrics for bacterial community composition (Lozupone et al. 2011). Comparing different metrics allowed us to infer the robustness of our vector analyses. Based on distance matrices, principal coordinate analyses (PCoA) were performed and PCoA scores were used as input for vector analyses. MetaCyc pathway abundances, Kyoto Encyclopedia of Genes and Genomes (KEGG) orthologs and Enzyme Commission numbers were predicted with the PICRUSt2 plugin in QIIME2 (Douglas et al. 2020; Kanehisa et al. 2012). As recommended by the developers, we used a maximum nearest-sequenced taxon index (NSTI) cutoff of 2 to exclude unreliable predictions based on poorly characterized bacterial taxa. We used different rarefaction depths for the analyses of composition and inferred function of the gut microbiota, depending on sequencing depth for each dataset.
Usage notes
The data include R scripts to perform all analyses included in the main part of the paper and the Supplementary Material as well as data files with information on all samples (sample id, ecotype, habitat) and PCoA scores. These data files represent the basis for all analyses. Furthermore, we provide fasta files with ASV IDs and sequences for all study systems.