A manager’s guide to using eDNA metabarcoding in marine ecosystems
Data files
Sep 28, 2022 version files 269.29 MB
-
README_updated_20220928.md
-
updated_data.zip
Abstract
Environmental DNA (eDNA) metabarcoding is a powerful tool that can enhance marine ecosystem/biodiversity monitoring programs. Here we outline five important steps managers and researchers should consider when developing eDNA monitoring program: 1) select genes and primers to target focal taxa; 2) assemble or develop comprehensive barcode reference databases; 3) apply rigorous site occupancy based decontamination pipelines; 4) conduct pilot studies to define spatial and temporal variance of eDNA; and 5) archive samples, extracts, and raw sequence data. We demonstrate the importance of each of these considerations using a case study of eDNA metabarcoding in the Ports of Los Angeles and Long Beach. eDNA metabarcoding approaches detected 94.1% (16/17) of species observed in paired trawl surveys while identifying an additional 55 native fishes and provided more comprehensive biodiversity inventories. Rigorous benchmarking of eDNA metabarcoding results improved ecological interpretation and confidence in species detections while providing archived genetic resources for future analyses. Well designed and validated eDNA metabarcoding approaches are ideally suited for biomonitoring applications that rely on the detection of species, including mapping invasive species fronts and endangered species habitats as well as tracking range shifts in response to climate change. Incorporating these considerations will enhance the utility and efficacy of eDNA metabarcoding for routine biomonitoring applications.
Methods
Materials & Methods
The sequence of procedures followed in the eDNA sample collection, sequencing, and analysis is summarized in Figure 3. A full detailed description of the methods is provided in Appendix 1.
Field sampling: environmental DNA. To compare the efficacy of eDNA and trawling for monitoring of fish communities in the Ports of Los Angeles and Long Beach, we conducted paired eDNA and trawl surveys at seven stations on 20–21 August 2018, each of which is a standard trawl station in the conventional periodic biodiversity surveys conducted by the Ports (Tables 1, S1, S2, Figures 1 & 2C; full methods are detailed in Appendix 1). We sampled three replicate 1 L seawater samples at four locations at each of the 7 sampling stations along an approximately 150 m transect aligned with the trawl track, using the R/V Yellowfin as a sampling platform.
Field Sampling: Midwater Trawls. Within 15–90 minutes of the eDNA seawater sampling, we trawled the station for 5 minutes with a 7.6 m semi-balloon otter trawl with 2.5-cm side mesh and 1.3-cm mesh cod-end deployed from the R/V Early Bird II (Figure 2D), following standard protocols of the Ports Biosurvey program, as detailed in the 2018 Biosurvey report (Wood 2021). All fish were identified to species level and standard length was recorded. See Table S2 for details of trawl locations and depths.
Environmental DNA extractions, amplification, and sequencing. We exacted eDNA from the Sterivex filters within one week of collection using the DNeasy Blood and Tissue Qiagen Kit (Spens et al., 2017). We amplified eDNA with four primer sets (described below) modified with Illumina Nextera XT adapters (Illumina, San Diego, CA) in triplicate PCR replicates. We prepared sequencing libraries following the methods of Curd et al., 2019 which involved a second PCR indexing step to tag libraries (See S1 Appendix 1). Resulting indexed libraries were bead cleaned to remove fragments <200bp, were quantified with the Qubit BR assay (Thermofisher, Waltham, MA, USA), then sequenced on an Illumina MiSeq (Illumina Inc., La Jolla, CA, USA) with V3 kit to obtain paired-end 2x 300 bp reads at the Technology Center for Genomics & Bioinformatics (UCLA, CA, USA) (See S1 appendix 1 for further details).
Bioinformatic analysis and taxonomic assignment. We used the Anacapa Toolkit (Curd et al., 2019)for amplicon sequence variant (ASV) parsing, quality control, and taxonomic assignment (Curd et al., 2019; Gold et al., 2021a) (See S1 Appendix 1 for full description). Importantly, we note that we employed stricter sequence alignment parameters within the Anacapa classifier for the CO1 and 16S locus (95% identity and query coverage) than the 12S loci (80% identity and query coverage) given the lack of complete reference databases for the CO1 and 16S loci (Curd et al. 2019). Separate reference databases were generated for the 12S, COI, and 16S from all publicly available sequencing data in GenBank from October 2019 using CRUX with default parameters (Curd et al. 2019). A more comprehensive 12S reference database was created by supplementing the above database with barcodes from 252 native fish species as detailed in Gold et al. 2021a.
Impact of reference databases. To highlight the importance of complete reference databases, we compared MiFish Teleost bony fish sequences by conducting species identification using both the less complete 12S reference database and the more comprehensive 12S reference database described above (Gold et al. 2021). For all other comparisons and analyses using the MiFish Teleost bony fish data, we used the more comprehensive 12S fish reference database. For this study, we did not add additional local or regional DNA barcodes to 16S and CO1 reference databases and therefore warn that they are comparatively sparse for native invertebrate species (See Discussion).
Impact of primer selection. To identify the value of including additional markers in eDNA metabarcoding analyses, we compared taxonomic assignments made by each primer set and the incremental value of adding additional barcoding targets. Specifically, we used two fish primer sets: the MiFish Universal Teleost 12S primer set which is designed to target bony fishes, and the MiFish Universal Elasmobranch 12S primer set (Miya et al. 2015) which is designed to target sharks and rays. We also included two universal primer sets: the Leray metazoan CO1 primer set (313 bp) (Meyer et al., 2021), and the Kelly metazoan 16S primer set (114–160 bp) (Kelly et al., 2016) both designed to broadly for amplifying all marine metazoans, particularly invertebrates.
Decontamination pipeline. To explore the importance of decontamination pipelines designed to reduce false positives, we examined the combined fish eDNA data (combined results from both 12S MiFish primer sets) with and without site occupancy modeling. Decontamination of data including site occupancy modeling was conducted following the methods of Kelly et al. (2018) (See S1 appendix 1).
Archiving of samples, extracts, and raw sequence data. DNA extracts were archived in the CALeDNA freezer collection at UCLA {Formatting Citation}. Raw sequence data was archived on NCBI SRA (link provided upon acceptance).
Comparison of eDNA metabarcoding and trawl surveys. We compared the fish species identified at each site by conventional trawl survey methods with the combined fish eDNA data, visualizing species detections across methods through Venn diagrams and heat maps.
Spatial structuring of eDNA. We investigated the variability of eDNA signatures across all locations and stations (Oksanen et al., 2020). We then estimated the sampling completeness, the fraction of diversity observed with our sampling effort, within a given location, within a given station, and within the port as a whole (Hsieh, Ma & Chao, 2016). We did not examine the spatial variation of trawls given the limited sample size (n=7) and lack of within-station replication.
S1. APPENDIX I: FULL METHODS
Contamination precautions
False positives may result from contamination during eDNA collection as well as during DNA extraction and amplification. To prevent contamination in both field sampling and lab work, we followed precautions outlined in Goldberg et al., 2016. Specifically, personnel wore nitrile gloves at all times during field and lab sample processing. Prior to eDNA sample collection, we sterilized all containers, supplies, and work surfaces with 10% bleach solution followed by 30 min ultraviolet light (UV) treatment. We conducted all eDNA filter extractions and PCR preparations in an AirClean 600 PCR Workstation (Creedmoor, NC, USA) located in a clean room dedicated to DNA extractions and PCR preparations at the University of California, Los Angeles (UCLA). Before and after use, we cleaned the AirClean 600 PCR Work-station and pipettes with 10–30% bleach followed by a 30 min UV treatment. We used filtered pipette tips for all pre- and post-PCR procedures. These and other consumables were decontaminated by a 30 min UV treatment. In addition to the above, we included negative controls at all sample processing steps including field blanks, extraction blanks, and PCR blanks, we also used two positive controls to test for index hopping.
eDNA sample collection
We first sampled water for eDNA prior to trawling to ensure that trawling did not influence eDNA signals. To standardize the collection of seawater, we used three five L Niskin bottles (General Oceanics, Miami, FL, USA), with an anchor set to capture water from approximately 2 m above the seafloor (See Appendix S1 for full method details). Upon arrival at each location, sampling bottles were flushed continuously with surface water for a minimum of five minutes before deployment. Sea water from each sampling bottle was transferred to a sterile one L Kangaroo Enteral Feeding Gravity Bag (Covidien PLC, Dublin, Ireland). The first 0.5 L of each sample was used to rinse the bag and then discarded, the bag was then filled with one L of sample water and gravity filtered through a 0.22 μm Sterivex filter (Millipore Corp. Burlington, MA, USA) (Miya et al., 2015; Port et al., 2016; Curd et al., 2019). Nalgene tubing and connectors were sterilized prior to assembly in the AirClean PCR hood. Filtration was carried out in the ship cabin, shielded from sunlight, for 40 minutes or until one L of sample was filtered, whichever came first (Table S1). Previous testing had demonstrated that after 40 minutes a filter was clogged and longer filtration was fruitless (data not shown). Filter cartridges were then placed into individual sterile Whirl-Pak bags (Nasco, Fort Atkinson, WI, USA) and then placed in a cooler with dry ice (Curd et al., 2019). The resulting samples were kept on ice for up to 6 hours before being transferred to a -20° F freezer. As a negative control, we processed a field blank at each station that consisted of one L of UltraPure DNAse/RNAse Free Distilled Water (Invitrogen, Carlsbad, CA) following standard methods (Goldberg et al., 2016; Kelly, Gallego & Jacobs-Palme, 2018).
Environmental DNA extractions, amplification and sequencing
At UCLA, all eDNA filter extractions and PCR preparations were conducted in an AirClean 600 PCR Work-station (Creedmoor, NC, USA) in a clean room dedicated to DNA extractions. The AirClean 600 PCR Work-station and pipettes were decontaminated before and after use with 10% bleach, 70% ethanol, and DNA Away, followed by a 20 min ultraviolet light (UV) treatment. Filtered pipette tips were used for all protocols. There were negative controls (extraction blanks and PCR blanks) for all steps in the sample processing.
Sterivex filters were extracted with a DNAeasy Tissue and Blood Qiagen Kit (Spens et al., 2017) protocol. Proteinase K and ATL buffer was added to the filter cartridge and incubated overnight at 56° C. DNA was amplified with primer sets with Illumina Nextera adapter modifications for use with Nextera XT indexes (Product Number FC-131-2001 through -2004; 5200 Illumina Way, San Diego, CA 92122). The primer sets presented here were: (1) MiFish Universal Teleost 12S primer (176bp) (Miya et al., 2015); (2) MiFish Universal Elasmobranch 12S primer (186bp) (Miya et al., 2015). Samples were also amplified for (3) metazoan COI primers (313bp) (Meyer et al., 2021); and (4) metazoan 16S primers (~115bp) (Kelly et al., 2016), data not shown. PCR amplifications in triplicate for each primer pair contained 12.5 μL QIAGEN Multiplex Taq PCR 2x Master Mix, 6.5 µL dH2O, 2.5 µL of each primer (2 µmol/L), and 1 μL DNA template (25 µL total reaction volume). PCR thermocycling employed a touchdown profile with an initial denaturation at 95° C for 15 min to activate the DNA polymerase followed by 13 cycles with a denaturation step at 94° C for 30 sec, an annealing step with temperature starting at 69.5° C for 30 sec (temperature was decreased by 1.5° C every cycle until 50° C was reached), and an extension step at 72° C for 1 min. An additional 35 cycles were then carried out at an annealing temperature of 50° C using the same denaturation and extension steps above, followed by a final extension at 72° C for 10 min. All PCRs included a negative control, where molecular grade water replaced the DNA template. All PCR products were run on 2% agarose gels to ensure amplification success and correct product size.
To prepare PCR products for sequencing, we pooled triplicate PCR reactions using 5 µL volume from each PCR. Pooled samples were cleaned using Serapure magnetic beads (Faircloth & Glenn, 2014), and cleaned PCR product concentrations were quantified using the high sensitivity Quant-iT™ dsDNA Assay Kit (Thermofisher Scientific, Waltham, MA, USA) on a Victor3™ plate reader (Perkin Elmer Waltham, MA, USA). Sample DNA libraries were prepared using the Nextera Index Kits (Illumina, San Diego, CA, UCA) and KAPA HiFi HotStart Ready Mix (Kapa Biosystems, Wilmington, MA, USA). This second indexing PCR was performed using a 25 μL reaction mixture containing 12.5 μL of Kapa HiFi Hotstart Ready mix, 0.625 μL of primer i7, 0.625 μL of primer i5, and 10 ng of template DNA, and used the following thermocycling parameters: denaturation at 95° C for 5 min, 5 cycles of denaturation at 98° C for 20 sec, annealing at 56° C for 30 sec, extension at 72° C for 3 min, followed by a final extension at 72° C for 5 min. All indexed PCR products were run on 2% agarose gels to ensure successful PCR and correct product size. Resulting libraries were bead cleaned and quantified as described above. Finally, we pooled indexed libraries by barcode in equimolar concentration, and then sequenced each of the four libraries on a MiSeq at the Technology Center for Genomics & Bioinformatics (University of California Los Angeles, CA, USA), using Reagent Kit V3 with 20% PhiX added to all sequencing runs.
Bioinformatic analysis and taxonomic determination
Taxonomic assignments of eDNA metabarcoding reads require a comprehensive sequence reference library. To generate these libraries for our given marker sets, we used CRUX (Constructing Reference libraries Using eXisting tools) (Curd et al., 2019) to query NCBI GenBank October 2019 for all available sequences that overlapped with each of our primer regions. One library was constructed for both the MiFish Teleost and Universal 12S primer set given the locus similarity, one library was constructed for the CO1 metazoan primer as detailed, and one library was constructed for the 16S metazoan primer set using the default CRUX parameters (Curd et al., 2019) (Dryad links provided upon acceptance).
We used the Anacapa Toolkit (Curd et al., 2019) for amplicon sequence variant parsing, taxonomic assignment, and quality control. The quality control step of the Anacapa Toolkit trims extraneous adapter sequences used to identify each unique sample, removes low quality reads, and sorts reads by metabarcode primer sequence. A key advantage of the Anacapa Toolkit is that it can simultaneously process raw fastq reads for samples with single or multiple metabarcode targets generated on Illumina MiSeq machines. It is also not required that all samples contain reads for each metabarcode, thus allowing users to combine multiple projects or targets on the same sequencing run while only running the pipeline once.
The amplicon sequence variant (ASV) parsing step uses DADA2 (Callahan et al., 2016) to dereplicate our metabarcodes. Unlike OTUs, which cluster sequences using an arbitrary sequence similarity (e.g. 97%, which corresponds to a gross average genetic sequence difference across all of life), ASVs are unique sequence reads determined using Bayesian probabilities of known sequencing error. These unique sequences can differ by as little as two base pairs, providing improved taxonomic resolution and an increase in observed diversity (Callahan et al., 2016; Amir et al., 2017).
Next the Anacapa toolkit module assigns taxonomy to ASVs using Bowtie 2 (Langmead & Salzberg, 2012) and a Bowtie 2-specific Bayesian Least Common Ancestor (BLCA) algorithm. All ASVs are first globally aligned against the CRUX database using Bowtie 2. Any ASV that fails to align is then aligned locally. The best hits (the top 100 Bowtie 2 returns) are then processed with the BLCA script to assign taxonomy. The Bowtie 2 BLCA algorithm was adapted from https://github.com/qunfengdong/BLCA. BLCA uses pairwise sequence alignment to calculate sequence similarity between query sequences (a given ASV sequence found in the environment) and reference library hits. Taxonomy is assigned based on the lowest common ancestor of multiple reference library hits for each query sequence. The reliability of each taxonomic assignment is then evaluated through bootstrap confidence scores (Gao et al., 2017). Scores are based on Bayesian posterior probability which quantify the similarity of reference barcode sequences to the query sequence. The higher the similarity between the database barcode sequence and the query sequence, the greater the contribution to the taxonomic assignment of the query. Ultimately, this method provides a strong probabilistic basis for evaluating taxonomic assignments with bootstrap confidence scores.
For the two fish primer sets, taxonomic assignment was conducted following benchmarking by Gold et al. (2021) using a taxonomic cutoff score of 60 and minimum alignment of 80%. Two different combinations of reference databases were used for taxonomic assignment. We first assigned taxonomy only using the CRUX October 2019 reference database which lacked region specific fish barcodes here in referred to as the limited database. We then assigned taxonomy using a combination of the regional and global reference databases which contained 252 additional California Current species described in Gold et al. 2021 here in reference to as the comprehensive database. For the comprehensive database assignments, taxonomy was first assigned using the curated regional database of California Current Large Marine Ecosystem fishes to identify native taxa. We then re-assigned taxonomy using the global CRUX generated database to identify non-native and non-fish species. Taxonomic assignments of ASVs were synonimzed between both methods by prioritizing higher resolution assignments (i.e. species level vs. genus level).
As compared to the 12S MiFish primer sets, the two universal primer sets CO1 and 16S have substantially less complete reference databases, and they also lack comprehensive marker validation and benchmarking, thus reducing the accuracy of taxonomic assignments (Edgar, 2018; Curd et al., 2019). Thus to minimize both misclassification and overclassification of CO1 and 16S sequences we used a conservative taxonomic cutoff score of 80 and minimum sequence alignment of 95%.
Bioinformatic decontamination
After processing the raw sequence reads through the Anacapa Toolkit, the resulting species community tables were transferred into R (R Core Team 2016) for subsequent decontamination and downstream data analysis using phyloseq (version: 1.32.0) (McMurdie & Holmes, 2013) and ranacapa (version 1.0) (Kandlikar et al., 2018) R packages. The raw species community table needs to be decontaminated to eliminate potential field contamination, lab contamination, and sequence index hopping (Goldberg et al., 2016; Costello et al., 2018). Field and lab contamination can arise because of inadequate sterile procedures, careless laboratory work, and reagents, particularly enzymes which are generated from living organisms (Goldberg et al., 2016). Sequence index hopping occurs when the DNA index tag used to label each unique sample chemically swaps with the DNA index tag of another sample, leading to cross contamination of species between samples. However, to address these sources of contamination, we followed rigorous sterile procedures and sequenced our libraries on the MiSeq platform that has less index hopping compared to Illumina sequencers that use patterned flow cells.
In addition to these precautionary steps, we also implemented a decontamination procedure that eliminates any remaining sources of contamination (Kelly, Gallego & Jacobs-Palme, 2018 [supplemental methods], Kelly, Shelton & Gallego, 2019 [supplemental methods], Gallego et al., 2020 [supplemental methods]). One approach for decontamination is to subtract reads based on the prevalence of ASVs in negative control (McKnight et al., 2019). However, index hopping or cross-contamination can also result in the presence of abundant ASVs from true samples in negative controls (Costello et al., 2018), which risks removal of the most abundant and prevalent ASVs if strict removal decontamination is practiced. Therefore we applied the site occupancy modeling approach of Kelly et al. 2018 to retain only ASVs that occurred in high prevalence across locations and stations. We did not subtract reads based on negative controls given the occurrence of ASVs with the highest observed read counts. Following Kelly et al. 2018, we also excluded samples from analyses that had low read depth.
Normalizing eDNA reads across samples
Relating the number of sequence reads in a sample to the number of individuals of a taxon is extremely challenging. Not only do different organisms shed DNA at different rates, but the process of PCR amplification is known to have sequence-specific biases (Kelly, Shelton & Gallego, 2019; McLaren, Willis & Callahan, 2019; Silverman et al., 2021). However, it is possible to address some of those known biases in ways that make it plausible to explore eDNA’s ability to yield insights into relative abundance estimates. This metric assumes that PCR biases originate from template-primer interactions that remain constant across eDNA samples and align well with recent empirical and theoretical work (Shelton et al., 2016; Kelly, Shelton & Gallego, 2019; McLaren, Willis & Callahan, 2019; Silverman et al., 2021). Intuitively, therefore, it should be possible to infer information about taxon abundance by using appropriate normalizations of the counts of reads in each sample.
Thus we transformed our data into an “eDNA index” following Kelly et al. 2019. The eDNA index transformation is conducted by first normalizing all reads for a particular sequence by the total number of reads in each sample, then scaling those proportions to the largest observed proportion for that sequence across all samples. This results in a sequence-specific (species-specific) scaling between 0 to 1, where 1 is the sample with the highest number of reads for a given species and 0 is the least.
Spatial Analyses
We conducted species rarefaction analysis at the level of the Port, station, and location to compare sample coverage estimates (for a given set of sampling units, what fraction of total species were discovered) using the iNext package (Hsieh, Ma & Chao, 2016). We then calculated pairwise similarity between eDNA samples using Bray-Curtis similarity on eDNA index scores using the vegan package. This was followed by employing a PERMANOVA and Betadisp analysis using the vegan package to calculate the apportioned variation between the pairwise similarity of each sample against site, station, date collected, and sample volume filtered, visualizing the results using a principal coordinates analysis (PCoA) ordination using the phyloseq and vegan packages.
Usage notes
Anacapa Toolkit: https://github.com/limey-bean/Anacapa
Code for Analyses in R: https://github.com/zjgold/Port-of-LA-LB-Project