Skip to main content
Dryad

Data from: Ecological forensic testing: Using multiple primers for eDNA detection of marine vertebrates in an estuarine lagoon subject to anthropogenic influences

Cite this dataset

Chiquillo, Kelcie; Wong, Juliet M.; Eirin Lopez, Jose M (2024). Data from: Ecological forensic testing: Using multiple primers for eDNA detection of marine vertebrates in an estuarine lagoon subject to anthropogenic influences [Dataset]. Dryad. https://doi.org/10.5061/dryad.h70rxwdrk

Abstract

Many critical aquatic habitats are in close proximity to human activity (i.e., adjacent to residences, docks, marinas, etc.), and it is vital to monitor biodiversity in these and similar areas that are subject to ongoing urbanization, pollution, and other environmental disruptions. Environmental DNA (eDNA) metabarcoding is an accessible, non-invasive genetic technique used to detect and monitor species diversity and is a particularly useful approach in areas where traditional biodiversity monitoring methods (e.g., visual surveys or video surveillance) are challenging to conduct. In this study, we implemented an eDNA approach that used a combination of three distinct PCR primer sets to detect marine vertebrates within a canal system of Biscayne Bay, Florida, an ecosystem representative of challenging sampling conditions and a myriad of impacts from urbanization. We detected fish species from aquarium, commercial, and recreational fisheries, as well as invasive, cryptobenthic, and endangered vertebrate species, including charismatic marine mammals such as the protected West Indian manatee, Trichechus manatus. Our results support the potential for eDNA analyses to supplement traditional biodiversity monitoring methods and ultimately serve as an important tool for ecosystem management. This approach minimizes stress or disturbance to organisms and removes the intrinsic risk and logical limitations of SCUBA diving, snorkeling, or deploying sensitive equipment in areas that are subject to high vessel traffic and/or low visibility. Overall, this work sets the framework to understand how biodiversity may change over different spatial and temporal scales in an aquatic ecosystem heavily influenced by urbanization and validates the use of eDNA as a complementary approach to traditional ecological monitoring methods.

README: Ecological forensic testing: Using multiple primers for eDNA detection of marine vertebrates in an estuarine lagoon subject to anthropogenic influences

https://doi.org/10.5061/dryad.h70rxwdrk

Overview

This repository contains comprehensive data and scripts associated with the study titled "Ecological forensic testing: Using multiple primers for eDNA detection of marine vertebrates in an estuarine lagoon subject to anthropogenic influences." The study is focused on utilizing environmental DNA (eDNA) metabarcoding to detect and monitor marine vertebrate species diversity within the canal system of Biscayne Bay, Florida. The dataset comprises essential components including supplemental tables, taxonomic assignments, raw sequencing data, and bioinformatic analyses. Additionally, the repository provides R scripts utilized for graph generation, enabling detailed insights into the sequencing process, bioinformatic analyses, and taxonomic assignments integral to the study's findings.

Brief Summary of Methodology

Sequencing and bioinformatic analyses were conducted to characterize vertebrate species diversity in Biscayne Bay, Florida. Libraries were prepared for paired-end 2x300 bp sequencing on an Illumina MiSeq platform, with pre-sequencing normalization, and samples containing short peaks below 1000 kb were removed. The bioinformatics workflow used DADA2, encompassing a primer sequence removal, quality control, and taxonomic assignment utilizing the MIDORI2 database of eukaryotic mitochondrial sequences. Negative control ASVs were excluded, and taxa unidentified at or above the order level were removed. Data were subsetted to include only Chordates by genus using phyloseq. Taxonomic identities were assigned for each of three primer sets (MarVer1, MarVer3, MiFish), and overlap between sets were compared using a Venn Diagram. Feeding types and species diversity focused on taxa detected by all primer sets.

Contents

Supplemental files

  • S1_SiteCoordinatesHabitatType.csv: Sampling site locations with GPS coordinates and key characteristics of the sampling sites
  • S2_NegativeControls.csv: List of ASVs from Negative Control samples that were removed from the dataset.
  • S3_ReadStatistics.csv: Number of raw sequence reads, filtered and cleaned reads for forward and reverse reads, merged reads, and merged reads with chimeras removed for each sample and primer set.
  • S4_AllTaxa_PresAbs_PrimersComb.csv: List of all 145 taxa identified (all primer sets combined) and their presence/absence at each site. A taxon was included under the condition that at least one of the three primer sets identified that it was present at at least one of the seven sites.
  • S5_MarVer1Taxa.csv: List of all taxa identified using the MarVer1 primer set, including read counts per site.
  • S6_MiFishTaxa.csv: List of all taxa identified using the MiFish primer set, including read counts per site.
  • S7_MarVer3Taxa.csv: List of all taxa identified using the MarVer3 primer set, including read counts per site.
  • S8_BySite_PrimersCombined.csv: List of all taxa (all primer sets combined) by site, including trophic description (i.e., herbivore, carnivore, or omnivore). A taxon was included under the condition that at least one of the three primer sets identified that it was present in at least one of the seven sites.
  • SupplementalTables_final_clean_Updated7Dec23_copy.xlsx: All the supplemental tables in one excel sheet

FASTQ Files for each primer (zip files), plus MIDORI2 reference fasta file

  • FastQ files were then fed into the DADA2 package to model DNA sequencing error on an Illumina run, controlling for read quality and picking Amplicon Sequence Variable (ASV) sequences that represent biological variability.
  • MiFish_fastq.zip: Compressed file containing FASTQ files for the MiFish primer.
  • MarVer3_fastq.zip: Compressed file containing FASTQ files for the MarVer3 primer.
  • MarVer1_fastq.zip: Compressed file containing FASTQ files for the MarVer1 primer.
  • MIDORI2_UNIQ_NUC_SP_GB253_srRNA_DADA2.fasta: FASTA file of the MIDORI2 database of eukaryotic mitochondrial sequences (Leray, Knowlton, and Machida 2022) which includes reference files for both 12S and 16S rRNA, was used for all three primer sets (i.e., the 12S rRNA reference for MiFish and MarVer1 datasets, and the 16S rRNA reference for the MarVer3 dataset)

Scripts

  • Figures_Rscript (1).R: R script for generating figures. This includes loading packages and code to run a Venn diagram, Trophic bar plot, and Upset plot.

MarVer1

  • MarVer1_DADA2_Script.R: R script for DADA2 processing of MarVer1 data.
  • AssignTax_DADA2_MarVer1.R: R script for taxonomic assignment from DADA2 data for MarVer1 primer.
  • AssignTaxa_MarVer1.sh: Shell script for taxonomic assignment of MarVer1 data.

MarVer3

  • DADA2_MarVer3_Script.R: R script for DADA2 processing of MarVer3 data.
  • AssignTax_DADA2_MarVer3.R: R script for taxonomic assignment from DADA2 data for MarVer3 primer.
  • AssignTaxa_MarVer3.sh: Shell script for taxonomic assignment of MarVer3 data.

MiFish

  • MiFish_DADA2_Script.R: R script for DADA2 processing of MiFish data.
  • AssignTax_DADA2_MiFish.R: R script for taxonomic assignment from DADA2 data for MiFish primer.
  • AssignTaxa_MiFish.sh: Shell script for taxonomic assignment of MiFish data.

Taxonomic results & metadata files for each primer set

  • Metadata provides additional information about the samples, such as sample identifiers, experimental conditions, sample collection details, and other relevant information.
  • metadata_MarVer1.csv: Metadata providing additional information about samples processed with the MarVer1 primer.
  • metadata_MarVer3.csv: Metadata providing additional information about samples processed with the MarVer3 primer.
  • metadata_MiFish.csv: Metadata providing additional information about samples processed with the MiFish primer.
  • MarVer1_Taxa.csv: Taxonomic results derived from the MarVer1 primer.
  • MarVer3_Taxa.csv: Taxonomic results derived from the MarVer3 primer.
  • MiFish_Taxa.csv: Taxonomic results derived from the MiFish primer.
  • SitePresenceAbsence.csv: Presence and absence data across different sampling sites.
  • TrophicBySite_PrimersCombined.csv: Trophic levels of organisms identified across sites.
  • MiFish_Site[X].csv: Taxonomic results derived from the MiFish (12S) primer across individual study sites within Biscayne Bay.
  • MarVer1_Site[X].csv: Taxonomic results derived from the MarVer 1 (12S) primer across individual study sites within Biscayne Bay.
  • MarVer3_Site[X].csv: Taxonomic results derived from the MarVer 3 (16S) primer across individual study sites within Biscayne Bay.
  • **Taxonomic Assignments: The "taxa.rds" files are not included, however our R scripts show how we generated a taxa.rds file for each primer. These assignments typically include information such as the taxonomic lineage (from domain to species level) for each sequence or operational taxonomic unit (OTU).

Missing data code- NA

Methods

Field Sites

For this study, vertebrate animals were targeted for eDNA analyses because they encompass a wide variety of categories commonly prioritized for ecology and conservation studies, such as endangered species, commercially valuable species, cryptobenthic species, and nonnative species; many of these species fulfill critical roles in aquatic ecosystems (Kelley et al. 2016) and provide a source of protein for humans (Tucker and Rogers 2014). Sampling sites were selected based on their diversity of benthic habitat structure (e.g., seagrass habitat, mud/sand, or coral reef) as well as proximity to human activity and sources of environmental stressors (i.e., recreational boating, fishing, and both public and private land use), representing the diversity of habitats found in Biscayne Bay.

Sampling sites were selected within two areas of Biscayne Bay that are approximately 30 miles apart from each other (Fig. 1A, Supplementary Table 1). Seven field sites were selected in total: one site at Florida International University’s Biscayne Bay Campus (FIU BBC) (Fig. 1B) and six sites at Paradise Point (Fig. 1C). Paradise Point is located on the western shoreline of South Florida within the village of Palmetto Bay in Miami-Dade County, FL. Directly to the east of Paradise Point is a nature preserve, the Deering Estate North Addition Preserve. Paradise Point itself, however, is primarily a private residential area, with many homes having direct water access in the form of beaches or boat docks. Sites 1 and 2 have a soft, bare sediment benthic structure (i.e., mud and sand), which are located within a canal that separates the nature preserve from the residential community. Site 2 is located at the opening of this canal, adjacent to a beach allotted to community residents. Site 3 is located at the southeastern end of Paradise Point and has a benthic seagrass community composed of a mixture of the seagrass species Thalassia testudinum, Syringodium filiforme, and Halodule wrightii. Site 3 is also commonly subjected to heavy boat traffic because this area connects the Paradise Point canal system to the larger Cutler Channel.

Sites 4-6 are within a canal on the northern coast of Paradise Point. The benthic structure at these sites is predominately soft sediment, with sparse strands of S. filiforme and H. wrightii present. Site 4, located at the most inland end of this canal, is lined by mangrove communities, and is largely inaccessible to motorized boats. In contrast, Sites 5 and 6 are adjacent to residences and privately-owned boat docks and are accessible to both motorized and non-motorized vehicles. Sites 5 and 6 were also selected due to their being southward of the Deering Bay Marina, another perceived anthropogenic influence (Alleman 1995) (Fig. 1C).

Site 7 was selected in an area further north within Biscayne Bay, adjacent to FIU BBC (Fig. 1B). This site is frequented year-round by recreational tourists visiting Oleta River State Park in North Miami Beach, FL. Historically, the shoreline within North Bay has been bulkheaded with limited mangrove shoreline, most of the bottom has been dredged, and benthic vegetation has been in decline (Cantillo et al. 2000). The benthic structure of Site 7 is primarily a hardbottom, rocky substrate with sparse patches of stony corals. Site 7 was used as a comparative site to Sites 1-6 at Paradise Point to further test the eDNA approach.

 Sample collection and water filtration

Water sampling was conducted across two consecutive days on April 28 (Site 7) and April 29 (Sites 1— 6) in 2022. This minimized the potential effects of seasonal variation in biological, chemical, and environmental conditions across sites. Water collections were performed (n = 3 replicate deployments) at each site using a 5 L Niskin bottle at a depth of 0-1 meters. The water from each Niskin deployment was stored within three 1.5 L water collection bottles that had been previously cleaned via immersion in a 5% bleach solution, followed by a DI water rinse, a 70% ethanol rinse, and a final sterilization under ultraviolet light for 15 minutes. Prior to sampling each site, all collection equipment (5 L Niskin bottle, plastic tubing) was sterilized using 5% bleach and rinsed with deionized (DI) water. This was followed by the collection of a negative field control sample, in which the Niskin bottle was used to collect 1.5 L of DI water. Negative field controls were used to detect and remove any incidental DNA contamination introduced to the samples during collection and processing. Sampling produced nine 1.5 L field samples and one 1.5 L negative control sample for each site, for a total of n = 70 water samples across all sites. All water samples were immediately transported to facilities at FIU BBC where they were frozen at -20 °C until processing. A vacuum pump (model: Rocker300, 23 L/min) was used to filter each water sample through a 0.22 µM Sterivex filter. All filtration processes were conducted within a clean fume hood to minimize potential contamination. All equipment was sterilized between samples using 5% bleach, DI water, and 70% ethanol. All 0.22 µM filters from each of the 70 samples were subsequently stored at -70 °C in FIU’s Environmental Epigenetics Lab until DNA extractions were performed.

Environmental DNA isolation and PCR amplifications

DNA extractions and PCR amplifications were performed using stringent molecular laboratory practices to limit potential contamination. Lab benches were cleaned regularly with 10% bleach, followed by DI water, and 70% ethanol. All pipets, tube racks, and other equipment were cleaned with ELIMINase (Decon Labs, Inc.), followed by DI water, 70% ethanol, and UV sterilization. Only sterile, RNase free, DNase free, and pyrogen free tubes and filtered pipet tips were used. Lab personnel wore gloves and face masks at all times.

DNA extractions were performed using a Qiagen DNeasy Blood and Tissue kit following the manufacturer’s instructions, with minor modifications optimized for isolating eDNA from the Sterivex filters. Specifically, lysis was performed in a greater reagent volume per sample (720 µL of Buffer ATL and 80 µL of proteinase K solution) to ensure the filters were completely immersed in lysis buffer, and samples were incubated overnight at 56 °C with end-over-end rotation. DNA purity, quantity, and quality were verified using a NanoVue Plus, Qubit 2.0, and gel electrophoresis. A total of 70 eDNA samples were isolated, including those extracted from negative field control water samples (i.e., deionized water that was collected and preserved concurrently with field water sampling).

Three primer sets (Table 1) were used to target and amplify gene regions corresponding to teleost fish and broader marine vertebrate taxa: 1) MiFish, developed by Miya et al. 2015 to amplify a section of the 12S rRNA gene region; 2) MarVer1, developed by Valsecchi et al. 2020 to amplify a section of the 12S rRNA gene region; and 3) MarVer3, developed by Valsecchi et al. 2020 to amplify a section of the 16S rRNA gene region. These primer sets were selected to compare their efficiency in detecting marine vertebrate taxa and to maximize the capture of taxa that a single primer set alone may otherwise fail to detect.

PCRs were performed on each water sample in triplicate for each of the three primer sets, resulting in a total of 630 PCR products (70 eDNA samples x 3 primer sets x 3 reactions; Fig. 2). Each PCR reaction for the MiFish primer set was performed in a 25 μL reaction volume containing 0.5 μL of KAPA HiFi HotStart DNA Polymerase (1U/μL), 5 μL of KAPA HiFi Fidelity Buffer (5X), 0.75 μL of KAPA dNTP Mix (10 mM each), 0.5 μL of BSA (20mg/μL), 1 μL of the forward primer (10 μM), 1 μL of the reverse primer (10 μM), 15.25 μL of sterile water, and 1 μL of the eDNA extraction template. The MiFish PCR thermocycler protocol employed a touchdown profile following Pitz et al. 2020 with an initial 15-min denaturation step at 95 °C, followed by 13 cycles of a 30-s step at 94 °C, a 30-s annealing step that started at 69.5 °C and then decreased by 1.5°C for each subsequent cycle (the last cycle was at 50 °C), and a 90-s step at 72 °C. This initial touchdown profile was followed by 25 additional cycles of a 30-s step at 94°C, a 30-s annealing step at 50 °C, and a 45-s step at 72°C. The final extension step was for 10-min at 72°C.

Each PCR reaction for both the MarVer1 and MarVer3 primer sets was performed in a 20 μL reaction volume containing 0.5 μL of KAPA HiFi HotStart DNA Polymerase (1U/μL), 4 μL of KAPA HiFi Fidelity Buffer (5X), 0.6 μL of KAPA dNTP Mix (10 mM each), 0.4 μL of BSA (20mg/μL), 1 μL of the forward primer (10 μM), 1 μL of the reverse primer (10 μM), 10.5 μL of sterile water, and 2 μL of the eDNA extraction template. The MarVer1 and MarVer3 PCR thermocycler protocol employed touch-down profiles following Valsecchi et al. 2020 with an initial 4-min denaturation step at 94 °C. For MarVer1, the denaturation step was followed by 10 cycles using an annealing temperature at 54 °C, 10 cycles using an annealing temperature at 55 °C, and 18 cycles using an annealing temperature at 56 °C. For MarVer3, the denaturation step was followed 8 cycles using an annealing temperature at 54 °C, 10 cycles using an annealing temperature at 55 °C, 10 cycles using an annealing temperature at 56 °C, and 10 cycles using an annealing temperature at 57 °C. For both MarVer primer sets, each of the 38 cycles included a 30-s step at 95 °C, a 30-s step at the given annealing temperature, and a 40-s step at 72°C. The final extension step was for 5 min at 72°C.

For every PCR reaction mixture created, a negative control sample was included in which molecular grade water replaced the addition of the eDNA template. All PCR products were visualized via electrophoresis on 2% agarose gels to ensure amplification success and correct product size. PCR products were quantified using a Qubit dsDNA HS (High Sensitivity) Assay Kit and Qubit 2.0 fluorometer. PCR products were pooled so that equal amounts of amplified DNA were added to ensure equal representation for every collection site, and so that equal amounts of amplified DNA were used for the library preparation for each site for each primer set. Overall, a total of 24 PCR product pooled samples were produced (one sample from each of the seven sites, for each of the three primer pairs). A field control sample for each of the three primer pairs was also included (Fig. 2).  

Sequencing and Bioinformatic analyses 

Pooled PCR products were submitted to the NextGen DNA Sequencing core facility at the University of Florida’s Interdisciplinary Center for Biotechnology Research (UF ICBR). Samples that contained a short 40-50 bp peak were removed and amplicons were size selected for library peaks below 1000 kb. After quantification of pooled PCR products, libraries were constructed with a limited PCR cycle to ligate overhang adaptor sequences that are compatible with Illumina sequencing adapters and dual indexes. Library preparation was performed for paired end 2x300 bp sequencing on an Illumina MiSeq, and libraries were normalized prior to sequencing.

The bioinformatics workflow for post sequencing data was processed and adapted from the DADA2 pipeline (https://benjjneb.github.io/dada2/tutorial.html). Accordingly, primer sequences were first removed from the beginning and ends of the sequence FastQ files using “cutadapt” (Martin 2011) in R. FastQ files were then fed into the DADA2 package to model DNA sequencing error on an Illumina run, controlling for read quality and picking Amplicon Sequence Variable (ASV) sequences that represent biological variability. Reads were subsequently trimmed to remove low-quality regions and filtered by quality score, which was visualized using “plotQualityProfile” to identify sequences that did not meet the quality score requirements and remove them from the dataset using the “filterAndTrim” function, with a low level mismatch between exact sequences of primers and observed reads. After error profiles were characterized, forward and reverse reads were merged using “mergePairs”, and chimeras were removed using “removeBimeraDenovo”.

Once sequences were trimmed and assessed for quality control, each sequence was given a taxonomic identity using the “assignTaxonomy” algorithm, which matched sequences to a reference database (minimum threshold of 50%, which were the default settings). Here, the MIDORI2 database of eukaryotic mitochondrial sequences (Leray, Knowlton, and Machida 2022) which includes reference files for both 12S and 16S rRNA, was used for all three primer sets (i.e., the 12S rRNA reference for MiFish and MarVer1 datasets, and the 16S rRNA reference for the MarVer3 dataset). Sequences were combined with field site metadata using Phyloseq (Mcmurdie & Holmes, 2013) For each primer set, ASVs identified in the negative control were removed from all of the samples (Supplementary Table 2). Any taxa that were unidentified (i.e., did not match to the reference database) at the order level or above were also excluded; taxa identified to the family, genus, or species level remained in the dataset. Data were subset further to exclude all taxa except for Chordates and then conglomerated by genus using the “tax_glom” function in phyloseq to simplify downstream presence-absence tables for each site.

Taxonomic identities were assigned to all three primer sets. Primer sets were analyzed to determine the identity and quantity of taxa that were detected by more than one primer set as well as the identity and quantity of taxa that were only identified by one of the three primer sets. In a subsequent analysis, the taxa that were detected by all three primer sets, and therefore have high confidence that their presence was not a false positive, were explored further. After examining how taxa identification varied by primer set, the results from all primers were combined into one dataset. A taxon was included in the dataset as long as it fulfilled the requirement that at least one primer set had detected it in at least one of the seven study sites. Although read counts were collected as part of the dataset, for the present study they were not used to estimate organism abundance (see Discussion). Therefore, only taxa presence or absence per site was assessed.

Funding

National Science Foundation, Award: 2109466, NSF Postdoctoral Research Fellowships in Biology

National Science Foundation, Award: 2010791, NSF Postdoctoral Research Fellowships in Biology

National Science Foundation, Award: NSF-HRD‐2111661, Centers of Research Excellence in Science and Technology CREST Program

Herbert W. Hoover Foundation

Florida International University, Award: 1729, Institute of Environment