Data from: Microbial dysbiosis precedes signs of sea star wasting disease in wild populations of Pycnopodia helianthoides
Data files
Feb 29, 2024 version files 14.94 MB
-
0502asv-table-sswd.csv
-
ANCOMBC_ASV_All.csv
-
ANCOMBC_Lev7_All.csv
-
EC_predicted.tsv.gz
-
KO_predicted.tsv.gz
-
level-7.csv
-
marker_predicted_and_nsti.tsv.gz
-
middle-lvl-otus.csv
-
README.md
-
SRA_16s_Pycno_metadata.xlsx
-
table-lev7.qzv
-
table.qzv
-
taxonomy.qzv
Abstract
Sea star wasting (SSW) disease, a massive and ongoing epidemic with unknown cause(s), has led to the rapid death and decimation of sea star populations with cascading ecological consequences. Changes in microbial community structure have been previously associated with SSW, however, it remains unknown if SSW-associated dysbiosis is a mechanism or artifact of disease progression, particularly in wild populations. Here, we compare the microbiomes of the sunflower sea star, Pycnopodia helianthoides, before (Naïve) and during (Exposed and Wasting) the initial outbreak in Southeast Alaska to identify changes and interactions in the microbial communities associated with sea star health and disease exposure. We found an increase in microbial diversity (both alpha and beta diversity) preceding signs of disease and an increase in abundance of facultative and obligate anaerobes (most notably Vibrio) in both Exposed (apparently healthy) and Wasting animals. Complementing these changes in microbial composition was the initial gain of metabolic functions upon disease exposure, and loss of function with signs of wasting. Using Bayesian network clustering, we found evidence of dysbiosis in the form of co-colonization of taxa appearing in large numbers among Exposed and Wasting individuals, in addition to the loss of communities associated with Naïve sea stars. These changes in community structure suggest a shared set of colonizing microbes that may be important in the initial stages of SSW. Together, these results provide several complementary perspectives in support of an early dysbiotic event preceding visible signs of SSW.
README: Microbial dysbiosis precedes signs of Sea Star Wasting Disease in wild populations of Pycnopodia helianthoides
This dataset contains organized datafiles use in our analysis of the microbiome of wild sunflow sea stars (Pycnopodia helianthoides) in an area of southern Alaska as a wasting disease impacted the area for the first time. We include our raw ASV in a dataframe with predicted taxanomic assignment, outputs from Qiime2 16s RNA pipeline for microbial diversity, differential abundance tables, Picrust2 functional analysis results, and bipartite clustering results.
Data and file structure
Metadata File:
-
SRA_16s_Pycno_metadata
is a file containing metadata on each sample. naming convention for column "site-animal-health" is HH = Naive, SH = Exposed, and SS = Wasting
Qiime2 Files:
-
table.qzv
is a qiime2 view file containing ASV and abundance metrics -
table-lev7.qzv
is a Qiime2 view file containing the Species-level features and abundances across samples -
0502asv-table-sswd
contains ASV_Feature identifiers with abundances across samples. -
taxonomy.qzv
is a Qiime2 view file containing the taxonomic assignment for each ASV mapped tot he Greengenes database. -
level-7.csv
is a data table containing each species-level "or as close as predicted" identified taxa with counts for each sample. HH = Naive, SH = Exposed, SS = Wasting.
ACNOM-BC files:
-
ANCOMBC_ASV_All.CSV
andANCOMBC_Lev7_All.CSV
are the combined output files from ANCOMBC analysis comparing differential abundance betweenNaive(HH) vs Exposed(SH)
andExposed(SH) vs Wasting(SS)
. Lev7_ALL contains identified taxonomic groups down to species level, whereas ASV_ALL has the raw amplicon sequence variants.- the prefix HHtoSH represents changes in the Exposed group relative to Naive
- the prefix SHtoSS represents changes in the Wasting group relative to Exposed
- NA's in the table signify taxa that were filtered out due to low number counts/presence across samples. For example, Taxa included in comparisons within HHtoSS may not be included in the comparison between SHtoSS due to poor representation across SS samples.
- Variables in data table as described by ANCOM-BC vignette below: https://rdrr.io/bioc/ANCOMBC/man/ancombc.html
-
beta
, coefficients obtained from the ANCOM-BC log-linear (natural log) model. -
se
, standard errors (SEs) ofbeta
. -
W
, test statistics.W = beta/se
. -
p_val
,p-values. P-values are obtained from two-sided Z-test using the test statisticW
. -
q_val
, adjusted p-values. Adjusted p-values are obtained by applyingp_adj_method
top_val
. -
diff_abn
, a logical TRUE/FALSE if the taxon hasq_val
less thanalpha
.
-
PICRUST-2 Files:
-
KO_predicted.tsv.gz
, andmarker_predicted_and_nsti.tsv.gz
are the raw outputs from our picrist2 analysis used for downstream analyzing of predicted metabolic pathways between site-animal-health groups.
Bipartite Analysis Files:
-
middle-lvl-otus.csv
contains the output from bipartite clustering of Taxa at level-7 identification and sample data to distinguish patterns of co-occurrence between groups.
Sharing/Access information
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: BioProject, PRJNA931596 on NCBI.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmars.2023.1130912/full#supplementary-material
Code/Software
Code used to analize these files are available on Github at https://github.com/PespeniLab/P.helianthoides-SSW-16s
Methods
Sample collection
Samples were collected in the summer of 2016 off the coast of Southeast Alaska (Fig. S1). With the aid of SCUBA, a biopsy from a single ray was collected underwater from each sampled sea star, and isolated until in a wet lab aboard a research vessel (R/V Kestrel, Alaska Department of Fish and Game). Epidermal biopsy samples were collected from sea stars at both impacted and naïve sites at depths ranging from 7 to 18 meters. Seven total sites were sampled (2 Naïve and 5 impacted) with a total of 18 Wasting, 20 Exposed, and 47 Naïve individuals sampled (total N=85). Nonlethal biopsy punches (3.5mm diameter biopsy punch, Robbins Instruments, Chatham, NJ) were taken from the body wall of each individual and preserved in RNAlater (ThermoFisher Scientific, Waltham, MA) in 2ml tubes. Only epidermal body wall tissue was sampled, even when sampling wasting individuals. For individuals displaying SSW symptoms, wasting epidermal tissue at the edge of the lesion was sampled. All biopsy tissue samples were shipped to Vermont on dry ice and stored at −80 °C until processing.
RNA extraction and cDNA reverse transcription.
RNA was extracted from each biopsy using a modified TRIzol protocol (TRIzol reagent ThermoFisher Scientific, Waltham, MA). After lysing tissue in 250ul TRIzol, it was homogenized for 20 minutes with a plastic pestle with 750ul additional TRIzol using a Vortex Genie2 (Scientific Industries, Bohemia, NY). To extract RNA, 200 ul chloroform (ThermoFisher Scientific, Waltham, MA) was added, inverted 15 times, incubated for 3 minutes at RT, and centrifuged at 4 °C for 15 minutes at 12,000×g. The RNA-containing supernatant was transferred to a new tube and the previous step was repeated. Adding 500 ul isopropanol (ThermoFisher Scientific, Waltham, MA) and 1 ul 5 mg/ml glycogen (Invitrogen, Carlsbad, CA), incubation for 10 minutes at room temperature, and centrifugation for 5minutes at 7500×g at 4 °C precipitated the RNA from the supernatant. After drying for 10 minutes at RT the RNA pellet was dissolved in 50 ul nuclease-free water. A NanoDrop 2000 Spectrophotometer (ThermoFisher Scientific, Waltham, MA) and Qubit 3.0 Fluorometer (Life Technologies, Carlsbad, CA) were used to measure the quality and quantity of the RNA extractions. To check for DNA contamination, we performed negative amplification PCR (16S PCR amplification parameters below). Random hexamer primers reverse transcribed cDNA with SuperScript IV (Invitrogen, Carlsbad, CA).
16S PCR amplification and sequencing.
To amplify the V3 and V4 region of the 16S bacterial gene, we used the primers: forward 5′TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG and reverse 5′GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC (31). 25 ul PCR reactions (1X MiFi Mix (Bioline, Toronto, Canada), 200nM each primer, and 2ul cDNA) were run with the following conditions: 95 °C for 3 minutes followed by 25 cycles of 95 °C for 30 seconds, 55 °C for 30 seconds, and 72 °C for 30 seconds, with a final extension at 72 °C for 5 minutes. To clean the PCR products, AMPure XP beads (Beckman Coulter, Brea, CA) and MiSeq indexing adapters were added. The indexed PCR products were cleaned again with AMPure XP beads, following Illumina 16S metagenomic sequencing library preparation protocol. To validate band size, the cleaned, indexed PCR products were run on a 2% agarose gel. 16S rRNA Library sequencing was performed at the Cornell Biotechnology Resource Center (Ithaca, NY) using 2 × 300 base pair overlapping paired-end reads on an Illumina MiSeq platform.
Sequence data processing, taxonomic assignment, and diversity metrics
Sequences were demultiplexed and barcode sequences were removed by the core facility. QIIME2 (v.2-2021.8) was used for data cleaning and analysis (32). Paired end reads were imported into QIIME2 and read quality was assessed using QIIME2’s ‘qiime_demux_summarize’ function. Read quality was determined by Phred score (>30) then subsequently denoised and trimmed using DADA2 (33), which removes errors and noise from data sequenced by Illumina, and creates information about the removed data. DADA2 parameters were set at trimming forward reads at 16bp and truncating at 289bp and reverse reads were trimmed at 0bp and truncated at 257bp. Diversity metrics were run using the ‘qiime_diversity_core-metrics-phylogenetic’ function with an Amplicon Sequence Variant (ASV) sampling depth of 13,547 to ensure maximal sample depth without omitting any samples. ASV richness (alpha diversity) of Naïve, Exposed, and Wasting asteroids was calculated using the Shannon diversity index using the qiime2 plugin and the between-group diversity (beta diversity) was calculated using the weighted_unifrac_distance_matrix to take into account the relative abundance of ASVs shared between samples (34). One sample (HH02_18) was responsible for all identified outliers in our beta-diversity analysis between Naïve samples, and was removed. Variation among Naïve individuals was 22.6% outlier-inclusive, and 19.9% with outliers removed (Fig 1D). Diversity plots were made using Qiime2 outputs visualized in GraphPad Prism version 9.3.0 for windows.
Taxonomic Classification
Taxonomy was assigned to each ASV by using the q2-feature-classifier (Bokulich et al. 2018) then trained and mapped to known bacterial taxa classified by the Greengenes database (35). Greenegene reference sequences were trimmed to match our data with a minimum length of 100bp and a maximum of 500bp. ASVs mapped to the same taxonomic classification were collapsed into Observable Taxonomic Units (OTUs) at the lowest level of identification provided by Greengenes database using the “qiime_taxa_collapse”. It should be noted that not all taxa were identifiable to the species level. However, ASV’s assigned to the same class, family, or genus could still be identified as distinct OTUs based on phylogenetic mapping, even if Greengenes could not confidently assign a specific species identification. For example, there are two separate classifications assigned to the genus Vibrio with no further classification at the species level, yet retained distinct abundances tracked separately through the analyses. Respiratory profiles (e.g., aerobic, facultative anaerobic, and obligate anaerobic) of microbes of interest were inferred from literature, as cited, describing the genus and/or family.
Differential Abundance
To test for differential abundance of microbial communities associated with exposure and onset of SSW, we used Analysis of Composition of Microbiomes with Bias Correction (ANCOM-BC) (36) in R version 4.2.1 (37). ANCOM-BC differential abundance analysis uses a series of pairwise comparisons to estimate the average abundance of a given microbial species through linear regression while correcting the bias induced by differences among samples. This method provides false discovery rate (FDR) corrected p-values for each taxon and confidence intervals for differentially abundant taxa. Differentially abundant taxa were defined based on FDR < 0.05 and taxa that were not present in at least 10% of the compared samples were dropped from the analysis. The W score represents the number of times the null-hypothesis (the average abundance of a given species in a group is equal to that in the other group) was rejected for a given species. Beta value represents the effect size as a log-linear (natural log) value relative between compared groups. Fold change was calculated by taking ebeta. Violin plots were constructed using log-10 transformed values from the raw abundance of each taxa using GraphPad Prism version 9.3.0 for windows.
Bipartite Clustering Analysis
To identify taxonomic communities and other high-level patterns of abundance, we used Bayesian stochastic blockmodeling, a principled approach to network clustering which finds statistically significant partitions in a network. First, a bipartite network was constructed by assigning sea star samples and ASVsOTUs to separate node groups. Samples and taxa were then connected with an edge if the taxa was present, with an edge weight of log(aij )+1, where aij > 0 is the abundance of taxa j in sample i. OTU abundance was aggregated to the species level, and taxa with total abundance less than 100 across all samples were removed. Samples and ASVs were assigned to hierarchical groups following the hierarchical stochastic block model (hSBM), with the maximum a posteriori estimate found using a specialized Markov chain Monte Carlo algorithm (38). We used the ”degree-corrected” variant of the model (39), since it had a smaller minimum description length than the simpler non-corrected model (Fig S4). Analyses were performed using graph-tool version 2.44 (40).
Functional profiling
Phylogenetic Investigation of Communities by Reconstruction of Unobserved States version 2 (PICRUSt2) was used to predict the functional content of the microbial communities (Douglas, et al. 2020). PICRUSt2 uses extended ancestral-state reconstruction of unknown microbes and a library of reference genomes to predict which gene families (by KEGG orthology (KO) and EC: enzyme functions) are present. PICRUSt2 also uses these data to infer abundances of metabolic pathways using a map of gene families to pathways from the online metacyc database (https://metacyc.org/). Predicted KO abundance was predicted by Picrust2 from the corresponding metagenomes from the 16s rRNA marker gene and KEGG functional hierarchies at level 2 were mapped using the KEGGREST v1.36 (41) package in R version 4.2.1 (37) to infer biological functional enrichment between the sample groups. Because KO mapping to KEGG orthologs provides predictions for all possible pathways (including eukaryotes), predicted eukaryotic categories were filtered from the analysis. The PICRUSt2 output was visualized with Morpheus (https://software.broadinstitute.org/morpheus). Predicted metacyc pathways (Table S3) and KEGG pathway functional categories differentially abundant between (1) Naïve and Exposed, and (2) Exposed and Wasting individuals were identified by t-test with Benjamini Hochburg False Discovery Rate correction (Padj < 0.01 and Padj < 0.05 respectively).