Creating, curating, and evaluating a mitogenomic reference database to improve regional species identification using environmental DNA
Data files
Jun 13, 2023 version files 9.12 MB
-
20220918_appendix_s7.zip
879.75 KB
-
20220927_appendix_s5.xlsx
6.25 MB
-
20230204_appendix_s3.xlsx
1.72 MB
-
20230207_appendix_s6.xlsx
31.74 KB
-
20230613_appendix_s1.xlsx
235.97 KB
-
README.md
6.08 KB
Abstract
Species detection using eDNA is revolutionizing global capacity to monitor biodiversity. However, the lack of regional, vouchered, genomic sequence information—especially sequence information that includes intraspecific variation—creates a bottleneck for management agencies wanting to harness the complete power of eDNA to monitor taxa and implement eDNA analyses. eDNA studies depend upon regional databases of mitogenomic sequence information to evaluate the effectiveness of such data to detect and identify taxa. We created the Oregon Biodiversity Genome Project to create a database of complete, nearly error-free mitogenomic sequences for all of Oregon's fishes. We have successfully assembled the complete mitogenomes of 313 specimens of freshwater, anadromous, and estuarine fishes representing 24 families, 55 genera, and 129 species and lineages. Comparative analyses of these sequences illustrate that many regions of the mitogenome are taxonomically informative, that the short (~150 bp) mitochondrial “barcode” regions typically used for eDNA assays do not consistently diagnose for species, and that complete single or multiple genes of the mitogenome are preferable for identifying Oregon’s fishes. This project provides a blueprint for other researchers to follow as they build regional databases, illustrates the taxonomic value and limits of complete mitogenomic sequences, and offers clues as to how current eDNA assays and environmental genomics methods of the future can best leverage this information.
Methods
Voucher Specimen and Tissue Collection
The study area initially encompassed the state of Oregon—the region of interest for our eDNA monitoring program—and expanded to a few sites in northern California and Washington State (Fig 3). To strategize sample collection, we examined historical location records in fish collections such as the Oregon State Ichthyology Collection and conferred with local biologists to identify current distributions. For cases where we knew or suspected that deeply divergent evolutionary lineages existed in the present concept of a species, we aimed to include representatives of all lineages. We ultimately identified 146 native and nonnative freshwater fish species and lineages that are currently found in Oregon and strategized collections to span watersheds throughout the state (Appendix S1).
To facilitate consistent sampling, we provided sampling kits (Appendix S2, Box S1) to collectors that contained a 500-mL Nalgene bottle filled with 10% formalin, a 2.0 mL cryotube filled with 95% EtOH, a sterile scalpel, scissors and tweezers, a bleach wipe, latex gloves, a detailed sampling protocol to ensure consistent tissue sampling and data collection (Appendix S2, Box S2), and a field notes sheet (Appendix S2, Box S3) for metadata collection. Collectors anesthetized and euthanized all fish specimens prior to tissue collection by immersion in an aqueous solution of Tricaine mesylate (MS-222) (400 mg MS-222, 400 mg sodium bicarbonate, 1 L water). We instructed all partners to collect a minimum of ~0.5 cm3 of tissue from each specimen, which was then placed in 95% EtOH for DNA extraction and sequencing. Euthanized fish were placed in 10% formalin as voucher specimens, thereby ensuring preservation of diagnostic features.
Taxonomic Verification, Accession, and Cataloging
Fish biologists identified specimens provisionally in the field and then Oregon State Ichthyology Collection taxonomists verified or refined those identifications by morphological examination and reference to published keys (Markle & Tomelleri, 2016; Wydoski & Whitney, 2003). The Oregon State Ichthyology Collection is in the process of accessioning and cataloging all vouchers and tissues. During that process, the curators input the metadata associated with each specimen and collection event into a relational database, and the full-bodied voucher specimens are transferred from formalin to isopropyl alcohol for permanent storage in a dedicated collection facility that complies with modern fire and earthquake safety codes. Tissues are stored in 2.0 mL cryotubes in -80°C freezers.
Mitogenome Assembly
To capture geographic genetic variation of each resident species across its distribution in Oregon, we sequenced the first collected representative of each species and subsequently sequenced specimens collected from separate watersheds, when possible. We stored gzipped fastq sequencing files on 2 x 1TB enterprise NL-SAS hard drives and performed mitogenome assemblies on 4 x 2.30 GHz 16-core processors using 512GB ECC RAM. Mitochondrial genomes were assembled de novo from raw paired reads using SPAdes assembler initially (versions 3.12.0-3.15.3) (Bankevich et al., 2012) and getOrganelle 1.6.2 or 1.7.5 (Jin et al., 2020) once released. We annotated all mitochondrial sequences using a combination of MITOS2WebServer (Al Arab et al., 2017; Donath et al., 2019) and Geneious 10.2.6 using annotations from identical or closely-related species.
Usage notes
Microsoft Excel, LibreOffice, or Microsoft's free XLS Viewer can be used to open the Excel files and an unzip utility such as 7-Zip or WinZip can be used to unzip zipped fastas. For pdfs, use Adobe Acrobat Reader. Open Microsoft Word documents using Microsoft Word, OpenOffice Writer or Google Docs.