Environmental DNA (eDNA) methods complement traditional monitoring and can be configured to detect multiple species simultaneously. One such approach, eDNA metabarcoding, uses high-throughput DNA sequencing to indirectly detect many different organisms, spanning broad taxonomic boundaries, from water samples. We are optimizing a non-invasive, low cost eDNA metabarcoding protocol to be used in conjunction with existing monitoring programs. One resource that is currently lacking for metabarcoding studies in general, including those in the San Francisco Estuary (SFE), is a comprehensive database of DNA barcode reference sequences. Without this foundational data, many species go undetected or misidentified in metabarcoding studies. To meet this need, we generated a custom barcode sequence database for the SFE by DNA sequencing and mining of public DNA seqeunce data for estuarine and freshwater species of interest to monitoring programs and ecological studies. Here we present custom reference sequence databases for three barcodes: Cytochrome C Oxidase I (COI), 12S MiFish and 16S.

Data were collected from two sources. Specimens of fish and invertebrates collected from the San Francisco Estuary were used for Sanger DNA sequencing. DNA extractions were performed using the Qiagen Blood and Tissue kit and PCR was performed using primers to amplify the entire barcode sequence. Raw chromatogram data files were manually examined for quality control, aligned, and flanking and primer sequences were trimmed using CodonCode Aligner. For species without physical specimens, or for those specimens that failed PCR/sequencing/QC, publicly available DNA sequences were downloaded from GenBank, and aligned and trimmed to the barcode region using CodonCode Aligner. The combined experimental and downloaded sequences for each barcode were placed into a single .txt file formatted for use with the DADA2 metabarcoding software. For all sequences, an additional verification step was performed by querying the BLASTn database. A separate metadata file (.csv) was also generated for each barcode that includes the specimen name (if applicable), GenBank Accession numbers (if applicable), taxonomic information, common name, and specimen locality, US state, and collection date, if available.

The barcode sequence databases (.txt) files can be opened with any text editor program (e.g., Notepad, TextEdit). The .csv metadata files can be opened with any text editor (e.g., Notepad, TextEdit) or spreadsheet software (e.g., Microsoft Excel).

Reference sequence database for eDNA metabarcoding of San Francisco estuary fishes and invertebrates

Data files

Abstract

Reference sequence database for eDNA metabarcoding of San Francisco estuary fishes and invertebrates

Data files

Abstract

Methods

Usage notes