Transitioning from environmental genetics to genomics using mitogenome reference databases
Cite this dataset
Dziedzic, Emily (2022). Transitioning from environmental genetics to genomics using mitogenome reference databases [Dataset]. Dryad. https://doi.org/10.5061/dryad.63xsj3v4n
Species detection using eDNA is revolutionizing the global capacity to monitor biodiversity. However, the lack of regional, vouchered, genomic sequence information—especially sequence information that includes intraspecific variation—creates a bottleneck for management agencies wanting to harness the complete power of eDNA to monitor taxa and implement eDNA analyses. eDNA studies depend upon regional databases of complete mitogenomic sequence information to evaluate the effectiveness of such data to differentiate, identify and detect taxa. We created the Oregon Biodiversity Genome Project working group to utilize recent advances in sequencing technology to create a database of complete, near error-free mitogenomic sequences for all of Oregon's resident freshwater fishes. So far, we have successfully assembled the complete mitogenomes of 313 specimens of freshwater fish representing 7 families, 55 genera, and 129 (88%) of the 146 resident species and lineages. Our comparative analyses of these sequences illustrate that the short (~150 bp) mitochondrial “barcode” regions typically used for eDNA assays are not consistently diagnostic for species-level identification and that no single region is best for metabarcoding Oregon’s fishes. However, often-overlooked intergenic regions of the mitogenome such as the D-loop have the potential to reliably diagnose and differentiate species. This project provides a blueprint for other researchers to follow as they build regional databases. It also illustrates the taxonomic value and limits of complete mitogenomic sequences, and how current eDNA assays and the “PCR-free” environmental genomics methods of the future can best leverage this information.
Voucher Specimen and Tissue Collection
This effort was motivated by the Oregon Biodiversity Genome Project (OBGP; www.obgp.org), a multi-institution collaboration between scientists and wildlife managers at Oregon State University, the Oregon Department of Fish and Wildlife (ODFW), and the United States Forest Service. The primary objective of the OBGP is to develop a regional genetic reference database to facilitate statewide eDNA monitoring programs for Oregon’s resident freshwater fishes. The specific goals of the OBGP (Fig 2a) are to: (1) use sterile laboratory methods to collect 10 georeferenced full-bodied vouchers of each freshwater fish species from dispersed watersheds in Oregon; (2) archive and link voucher specimens, tissues, and metadata for taxonomic verification and revision; (3) sequence full mitogenomes from multiple specimens per species; and (4) make all curated data publicly available via a client-server database accessed via a web browser.
The study area encompassed the State of Oregon—the region of interest for our eDNA monitoring program. We collected fishes in Oregon and expanded to a few sites in northern California and Washington State (Fig 2b). We examined historical location records in existing collections such as Oregon State Ichthyology Collection and conferred with local biologists to identify resident fishes and occupied locations. For cases where we knew or suspected that deeply divergent evolutionary lineages existed within the present concept of a species, we aimed to include representatives of all lineages. Biologists from ODFW ultimately identified 146 native and nonnative freshwater fish species and lineages that currently reside in Oregon and strategized collections to span watersheds throughout the state (Appendix S1). Each sampling kit (Appendix S2 Box S1) contained a 500-mL Nalgene bottle filled with 10% formalin, a 2.0 mL cryotube filled with 95% EtOH, a sterile scalpel, scissors and tweezers, a bleach wipe, latex gloves, a detailed sampling protocol to ensure consistent tissue sampling and data collection (Appendix S2 Box S2), and a field notes sheet (Appendix S2 Box S3) for metadata collection. Collectors anaesthetized and euthanized all fish specimens prior to tissue collection by immersion in an aqueous solution of Tricaine mesylate (MS-222). For collections in 2017, we worked with partners (Appendix S3 collecting_entity) who followed accepted procedures under Oregon State University and USFS IACUC protocols, but an IACUC was not required by all partner institutions. Specimen collection by ODFW was conducted under the agency’s statutory management authority and in 2018, 2019, and 2020 ODFW collected specimens for ESA-listed species under National Oceanic and Atmospheric Administration Permit numbers 21780, 22639, and 23527 respectively. Fish under USFWS jurisdiction (i.e. fish that are neither marine nor anadromous) were covered under ODFW’s ESA Section 6 Cooperative Agreement with USFWS. Details regarding partner collection permits and authority are listed in Appendix S3. We instructed all partners to collect a minimum of ~0.5 cm3 of tissue from each specimen, which was then placed in 95% EtOH for DNA extraction and sequencing. Euthanized fish were placed in 10% Formalin to ensure preservation of diagnostic features. When we failed to collect species or redundant examples of species, we augmented in-field collection with tissue samples loaned or gifted from North American ichthyology collections (OS14271, OS18056, OS18057, OS19982, OS19351, OS18993, OS20085, OS20084, OS20081, OS20080, OS20094, OS20088, OS20108, OS14271, OS22282, UW155929, UW158361, UAM:Fish:10376:401245, UAM:Fish:10464:374966, UAM:Fish:10464:374967). The goal of collecting 10 individuals per species was amended to collect three individuals and add specimens only if intraspecific genetic variation was detected in downstream mitogenome identity analyses (See below).
Taxonomic Verification, Accession, and Cataloging
ODFW biologists and partners identified specimens provisionally in the field and Oregon State Ichthyology Collection taxonomists verified and refined those identifications prior to cataloging the specimens by morphological examination and reference to published keys (Markle and Tomelleri 2016, Wydoski and Whitney 2003). The Oregon State Ichthyology Collection has arranged to accession all vouchers and tissues, with full-bodied voucher specimens being transferred from formalin to isopropyl alcohol for permanent storage. Tissues were stored in 2.0 mL cryotubes at -70°C in 95% EtOH. Accessioning and cataloging were ongoing at the time of writing.
After generating sequence data (See below), we performed distance-based cluster analyses in Geneious to verify morphological identification 10.2.6 using default settings (Global alignment with free end gaps, Cost Matrix of 65% similarity, Tamura-Nei Genetic Distance Model, Neighbor-Joining (NJ) Tree build Method, Gap open penalty of 12, Gap extension penalty of 3). We used the NAD2 gene for Catostomidae, Centrarchidae, Cottidae, Cyprinidae, Ictaluridae, and Salmonidae NJ trees. Because the species of Lampreys (Petromyzontidae) in Oregon possess very similar mitogenomes, we concatenated the NAD4, NAD5, and NAD6 genes in order to increase the length of sequence examined in the search for genetic clusters. In cases of incongruence between morphological and genetic clustering, we revisited the anatomical identifications of the vouchers, investigated the possibility of swapped or contaminated molecular samples, and corrected identifications as needed.
DNA Extraction and Sequencing
We subsampled tissues into ~1.0 mm3 volumes and extracted DNA from these subsamples using the Qiagen DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany) spin-column protocol for animal tissues. To further optimize the lysing process, we crushed tissues in-tube with a micropestle after incubation. We used the Invitrogen dsDNA Broad-Range assay Kit and a Qubit fluorometer (Invitrogen, Carlsbad, CA) to measure DNA concentrations and yield. For each extracted specimen, 100 µL of extract containing 100-2000 ng/µL of DNA was transferred to a 0.65 mL Bioruptor microtube and sonicated (30 s on, 90 s off; 6 cycles) to ~300 bp in length using the manufacturer's protocol using a Bioruptor Pico sonication system (Diagenode, Denville, NJ). We prepared libraries for next generation sequencing for the first two sequencing runs according to manufacturers’ instructions using the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA) (Appendix S3 library_prep). Oregon State University’s Center for Quantitative Life Sciences performed library preparation for the final two runs using the plexWell 96 Kit (SeqWell, Beverly, MA) (Appendix S3 library_prep). Paired-end (2 x 150 bp) sequencing was performed on all samples at multiplexing levels between 50 to 71 samples/lane (Appendix S3 spl) using an Illumina HiSeq 3000 at the Center for Quantitative Life Sciences.
To capture geographic genetic variation of each resident species across its range within Oregon, we sequenced the first collected representative of each species and subsequently sequenced specimens collected from separate watersheds. We stored gzipped fastq sequencing files on 2 x 1TB enterprise NL-SAS hard drives, and performed mitogenome assemblies on 4 x 2.30 GHz 16-core processors using 512GB ECC RAM. We targeted the first collected representative of each species for sequencing and maximized geographic distance among subsequent sequenced specimens to capture geographic genetic variation of all species throughout Oregon. Mitochondrial genomes were assembled de novo from raw paired reads using SPAdes assembler (versions 3.12.0-3.15.3) (Bankevich et al. 2012) or getOrganelle 1.6.2 or 1.7.5 (Jin et al. 2020). Three mitogenomes were recovered by performing reference-guided filtering with BLAT (Kent 2002) using the complete mitogenome sequences of identical or closely related species prior to SPAdes assembly. We resolved one mitogenome by first mapping reads in Geneious 10.2.6 to the noncircular mitochondrial contig produced from SPAdes de novo assembly and then reiteratively mapping reads to de novo assemblies subsequently produced in Geneious. When de novo mitogenome assemblies did not form a single contig with an overlapping splice point, we performed assembly polishing using BWA (Li and Durbin 2009) followed by Pilon (Walker et al. 2014), or polca.sh from MaSuRCA 4.0.5 (Zimin et al. 2013), used Sealer from ABySS 2.3.1 (Paulino et al. 2015) for gap-closing on one sequence. We calculated quality values (QV) of mitogenome contigs with Merqury (Rhie et al. 2020) and mapped reads to assembled mitogenomes to evaluate coverage uniformity using Tablet 1.21.02.08 (Milne et al. 2013). We then performed polishing and reassembled mitogenomes exhibiting coverage anomalies in an attempt to resolve assembly errors. We annotated all mitochondrial sequences using a combination of MITOS2 WebServer (Al Arab et al. 2017, Donath et al. 2019) and Geneious using annotations from identical or closely-related species. Details on pipelines used for individual sequences can be found in the Supplemental Information (Appendix S3).