Klamath River renewal project molecular library
Data files
May 27, 2025 version files 2.49 MB
-
Klamath_ML_metaworks_output.csv
1.76 MB
-
Klamath_ML_Raw.Reads.2023.csv
312.46 KB
-
Klamath_ML_Sample.ID.Metadata.2024.csv
326.58 KB
-
Klamath_Molecular_Library_metadata.xml
84.02 KB
-
README.md
6.81 KB
Abstract
The Klamath River Renewal Project (KRRP) is a large dam removal and river restoration project in California and Oregon, United States. With the removal of four large dams in 2024, restoration of connectivity to 640 river kilometers occurred. We created the KRRP molecular library, an environmental specimen bank, for long-term curation of environmental nucleic acids collected from the restoration project area. The library established sampling stations from 45 main stem and tributary sites, as well as pre-dam removal water samples preserved for environmental nucleic acids (eDNA and eRNA) intended to provide long-term use of both raw and extracted molecular material used to track the ecosystem response to dam removal. On a subset of samples, we conducted DNA metabarcoding using next generation sequencing based on the MiFish-U and modified MiFish-U-F primer sets. We used sequence reads to visualize data and calculate diversity metrics to establish a pre-dam removal baseline and proof of concept that the molecular techniques could resolve changes to fish and other aquatic organisms resulting from short- and long-term changes due to dam removal. These and future sampling efforts should, at a minimum, allow tracking of fish community response to ecosystem restoration.
https://doi.org/10.5061/dryad.0cfxpnwcn
Description of the data and file structure
Files and variables
File: Klamath_Molecular_Library_metadata.xml
Description: Metadata record of Klamath_ML_Raw.Reads.2023.csv and Klamath_ML_Sample.ID.Metadata.2024.csv following FDGC compliant metadata standards.
File: Klamath_ML_Raw.Reads.2023.csv
Description: Comma Separated Value (CSV) file containing data.
Variables
- MiseqRun: The run identifier code for the DNA next generation sequencing that was conducted on a Illumnina Miseq
- Sample.ID: A unique sample identification code for each sampling site
- ESVseq: "Exact Sequence Variant", referring to the unique DNA sequence obtained from metabarcoding.
- ESVsize: ESVsize is the number of reads of the given ESVseq
- CommonName: The common name of the taxon identified from next generation sequencing results.
- Klamath.Type: Klamath.type is a taxon-specific which allows tracking of the number of ESVs identified per species.
- ReportName: The taxonomic identification from the BLAST results
- Type: A broad categorization of the identified taxon into mammal, herpetofauna, bird, or fish groups
- sppIDNotes: Notes of interest or explanation regarding the taxonomic identification based on reference database output
- Site.Name: Unique site identifying code name
- Year: The year the sample was collected
File: Klamath_ML_Sample.ID.Metadata.2024.csv
Description: Comma Separated Value (CSV) file containing data.
Variables
- Sample.ID: A unique sample identification code for each sampling site
- Replicate.Number: The number of the water sample filter from each site level pooled sample
- Sample.Volume: The volume in milliliters that was filtered for a replicate from the pooled water sample at each site location.
- Sample.Collection.Method: Identifies whether a replicate was collected from a pooled sample or a deionized water blank serving as a negative control
- Filter.Preservation.Method: Description of how each eNA filter was preserved in the field
- Sample.Notes: Observations from the field
- Filter.Type: The type of filter used to collect the eNA sample
- Filter.Pore.Size: The pore size of the filter
- Survey.Date.and.Time: The year, month, day, and time of sampling
- Site.Name: Unique site identifying code name
- Crew: Initials of field staff involved with data collection
- Blank.Included: A column flagging whether the site involved collection of a field blank negative control
- Site.Notes: Observations from the field
- County: County name
- State: State Name abbreviation
- Year: Year of sampling
- Habitat: Aquatic habitat type where sampling occurred
- Size: A categorical variable related to the river waterbody size, differentiating mainstem Klamath River and tributary sites and Scott River
- Site.Character: Differentiates between sites that could be impacted by dam removal from those that are not expected to be directly impacted by dam removal and and whether those sites are upstream of natural migration barriers to anadromous fish or are accessible to anadromous fish.
- Project.Name: The title of the project
- Waterbody.Name: Name of river or tributary water body where water samples were collected
- Sample.Depth: A description of where in the water column a water sample was collected
- Sub.Site.Locations: At each study site, a composite water sample was collected from combinations of left bank, center channel, or right bank locations depending on accessibility
- flow.CMS: Model estimated discharge of the river or tributary where the water sample was collected based on USGS gage data (CMS)
- Air.Temperature: Temperature of the air during the time of sampling
- Water.Temperature: Temperature of the water during the time of sampling
- Dissolved.Oxygen: Dissolved oxygen of water sampling location based on multiprobe measurement (mg/l)
- Specific.Conductance: Specific conductance of water sampling location based on multiprobe measurement (ms/cm)
- pH: pH of water sampling location based on multiprobe measurement
- X.Coordinate: Longitude coordinate in decimal degrees
- Y.Coordinate: Latitude coordinate in decimal degrees
- Extraction.Status: Designates whether the environmental sample has been extracted and the type of extraction
- Location.Storage.Method: Description of how the sample has been stored. The location and type of freezer where the sample is reposited.
- DNA.Volume.ul.remaining: The amount of extracted DNA sample volume remaining in the original sample
- RNA.Volume.ul.remaining: The amount extracted of RNA sample volume remaining in the original sample
File: Klamath_ML_metaworks_output.csv
Description: Comma Separated Value (CSV) file containing data.
Variables
- MiseqRun: The run identifier code for the DNA next generation sequencing that was conducted on a Illumnina Miseq
- Sample.ID: A unique sample identification code for each sampling site
- ESVseq: "Exact Sequence Variant", referring to the unique DNA sequence obtained from metabarcoding
- ESVsize: ESVsize is the number of reads of the given ESVseq
- Strand: Direction of the DNA strand (usually "+" or "-" if relevant)
- Root: Usually the topmost taxonomic rank (often just "Root")
- RootRank: The rank name ("root")
- rBP: Bootstrap support for the Root assignment
- SuperKingdom: The super kingdom of the sequence
- SuperKingdomRank: Rank of the super kingdom
- skBP: Bootstrap support for the super kingdom assignment
- Kingdom: Kingdom
- KingdomRank: Kingdom rank ("kingdom")
- kBP: Bootstrap support for the Kingdom assignment
- Phylum: Phylum
- PhylumRank: Phylum rank ("phylum")
- pBP: Bootstrap support for the Phylum assignment
- Class: Class
- ClassRank: Class rank ("class")
- cBP: Bootstrap support for the Class assignment
- Order: Order
- OrderRank: Order rank ("order")
- oBP: Bootstrap support for the Order assignment
- Family: Family
- FamilyRank: Family rank ("family")
- fBP: Bootstrap support for the Family assignment
- Genus: Genus
- GenusRank: Genus rank ("genus")
- gBP: Bootstrap support for the Genus assignment
- Species: Species
- SpeciesRank: Species rank ("species")
- sBP: Bootstrap support for the Species assignment
Access information
Other publicly accessible locations of the data:
- There are no other publicly accessible locations of the data.
Data was derived from the following sources:
- Data were derived as part of an original research study. Data set is considered complete for the information presented, as described in the abstract. Users are advised to read the rest of the metadata record carefully for additional details.
Step 1
On the Klamath River, we selected monitoring locations systematically every 2 km along the river and reservoirs and every 1 km along selected tributaries, with additional sites included near long-term monitoring locations important to the restoration project. Monitoring locations spanned approximately 114 km of mainstem and tributary habitat. Sites were selected along navigable streams and only at locations where long-term continued accessibility was assumed. Sites along the river extend from approximately 3.5 km downstream of the lowermost dam upstream to where the river enters the uppermost reservoir to be removed. Tributary sites extended upstream to fish migration barriers, or as far as the project boundary allowed. Additionally, six “reference” sites in the Scott River basin, a large tributary downstream of the project area, were included to represent habitat not impacted by dams. Having a range of non-treatment reaches from which to compare treatment (i.e., dam removal) effects will be more representative of recovered conditions and biological communities.
Step 2
We collected 405 samples from 44 monitoring locations on 17 - 20 July, 2023. Access allowed, we collected three liters of stream water from each bank and the center of the channel at each site, for a total of nine liters. The nine liters of water were combined into a single vessel, agitated to encourage homogenization, and decanted into nine replicate samples via a filtering manifold. Each replicate was filtered in the field through 0.45 µm pore-size PVDF Sterivex filters, which capture eNA from the environment by trapping particles within the filter matrix, using Cole Parmer easy loader II peristaltic pumps and sterile Masterflex silicone tubing for each monitoring site. Pumps were affixed in parallel to allow for simultaneous filtration of three filter replicates and powered using a brushless, cordless drill with a 12.7 mm spade bit attachment. Field crews followed protocols to assess and minimize the risk of contamination, including using sterile single-use filters, caps, and tubing, changing nitrile gloves frequently, and collecting field controls at the start of each sampling day. When field sampling was complete, samples were preserved by pipetting into each filter cartridge 1.5 mL of RNAprotect Tissue Reagent (Qiagen) following practices to maximize the probability of stabilizing genetic material.
Step 3
Site-level water quality measurements (water temperature, dissolved oxygen, and specific conductance) were collected with a Yellowstone Instruments ProQuatro multiparameter meter, and air temperature with a Kestrel rotating-vane thermistor.
Step 4
Total DNA was isolated from each filter and purified to remove non-target cellular and environmental contaminants using the QIAamp DNA mini kit (Qiagen) and following a standard protocol with modifications. First, RNAprotect Tissue Reagent was evacuated from each filter by manually shaking the liquid from the cartridge. The exterior of each filter was sterilized with a PCR clean wipe to avoid cross-contamination. We added 440 μL of the Buffer PBS/Buffer AL/Proteinase K lysis solution to each filter by injecting the solution into the Sterivex cartridge using a filtered pipette tip. The filters were incubated for 5 min at 56 °C, then affixed to a vortex mixer to undergo two ten-minute room temperature vortex sessions. Between sessions, the filters were rotated 180° to ensure full coverage of the filter membrane. The solution was transferred from the Sterivex cartridge to a 1.5 mL microcentrifuge tube, and then QIAamp mini spin columns were used to bind DNA, and the remainder of the DNA purification and elution steps followed the published protocol by Miya et al. (2016). DNA extraction controls, created by adding 880 μL of the lysis solution to a sterile Sterivex filter, were processed in parallel with samples to confirm sample integrity throughout the extraction procedure. All samples and controls were passed through the Zymo OneStep PCR Inhibitor Removal Kit (Zymo Research) following the manufacturer's guidelines. DNA extraction was completed in a separate pre-PCR space using sterilized surfaces and equipment.
References:
Miya, M. et al. Use of a filter cartridge for the filtration of water samples and the extraction of environmental DNA. J. Vis. Exp. JoVE (2016) doi:10.3791/54741.
Step 5
Purified eNA can be analyzed using a variety of molecular techniques. For this study, we used DNA metabarcoding to assess the community-level composition of fish taxa at each sampling location. Metabarcoding employs next-generation sequencing with universal primers to sequence a diagnostic region of DNA that allows for species identification across taxa. We used a multiplex of the MiFish-U primer set and a modified version of the MiFish-U-F primer (GIQHerp-F), designed to enhance detection of herptile taxa, to sequence a 170 bp region of vertebrate 12S rRNA mitochondrial genome using a three-step PCR approach adapted from previously published library preparation methodologies. The initial PCR was completed using non-indexed primers to enrich subsequent reactions for target DNA. Each sample was amplified in triplicate, in a total reaction volume of 10 μl containing 4 μl extracted eDNA, 0.4 μM of each forward primer (MiFish-U-F: 5’- GTCGGTAAAACTCGTGCCAGC-3’, GIQHerp-F: 5’- GCCGGCTAATCTGGTGCCAGC-3’), 0.8 μM MiFish-U-R (5’- CATAGTGGGGTATCTAATCCCAGTTTG-3’), and 1X Qiagen Plus Multiplex Master Mix. Cycling began with an initial denaturation at 95 °C for 5 min, followed by 35 cycles of 95 °C for 15 seconds, 5% ramp down to 55 °C for 30 seconds, and 72 °C for 30 seconds. The triplicate PCR products were pooled and then diluted 1:10 prior to starting the Illumina adapter and barcoding processes.
The Illumina hanging tail adapters were incorporated using the MiFish-U and GIQHerp primer multiplex containing the 33 or 34 bp 5’ Illumina hanging tail adaptor sequences to provide a priming site for the addition of dual-indexed barcode sequences. Each reaction consisted of a 12 μl total volume containing 2 μl pooled and diluted product from the previous PCR, 0.3 μM of each Illumina adapter forward primer, 0.6 μM of the Illumina adapter reverse primer, and 6 μl KAPA HiFi HotStart ReadyMix (Roche Diagnostics). The cycling profile was as follows: 95 °C for 5 min, 5 cycles of 98 °C for 20 seconds, 1% ramp down to 65 °C for 15 seconds, and 72 °C for 15 seconds, then 7 cycles of 98 °C for 20 seconds, 5% ramp down to 65 °C for 15 seconds, 72 °C for 15 seconds. PCR products were diluted 1:10 and used as templates in the final PCR step. The paired-end dual indices that allow for sample identification and de-multiplexing were incorporated during the final PCR step. Each PCR was completed in a total volume of 12 μl, composed of 0.3 μM of the forward and reverse index primers, 6 μl 1X KAPA HiFi HotStart ReadyMix, and 1 μl of the diluted product from the previous PCR. Amplification started with 95 °C for 3 min, followed by 10 cycles of 98 °C for 20 seconds, 5% ramp down to 72 °C for 15 seconds, and final extension at 72 °C for 5 min. All PCR steps were completed using Bio-Rad C1000 Touch thermal cyclers (Bio-Rad Laboratories) in a designated PCR space.
Equal volumes of the indexed PCR products were pooled, then size selected (c. 370) using 2% gel electrophoresis and purified using QIAquick Gel Extraction Kit (Qiagen) following the manufacturer's guidelines for next-generation sequencing. Purified libraries were quantified using the Qubit 4 fluorometer and Qubit dsDNA HS assay Kit (Thermo Fisher Scientific), and sequenced on the Illumina Miseq system (Illumina, San Diego, CA, USA) using the v2 300-cycle chemistry. The final loading concentration was 8 pM with a 10% PhiX spike-in added as a sequencing control. A UV-sterilized hood was used to prepare a master mix for all PCR steps and to add extracted DNA during the initial PCR. All intermediate dilution, DNA transfer, and final pooling steps were completed in designated post-PCR spaces using sterilized pipettes and bench tops. No template PCR controls were processed in parallel with samples and sequenced to confirm process integrity.
To determine provisional species identification, the resultant sequencing data were compiled and processed using the MetaWorks pipeline, 12S vertebrate classifier, and default parameters (Porter and Hajibabaei, 2022). We removed any detections with less than 100 sequence reads to screen out potential artifact sequences. Final taxonomic assignment was verified against the NIH National Center for Biotechnology Information reference database using the BLAST algorithm. We used the standard nucleotide BLAST (blastn suite) to compare detected sequences to sequences stored in the core nucleotide database (core_nt). Provisional species with a greater than or equal to 97% sequence similarity to a known species in the reference database were considered a match and successfully identified to species. Sequences with greater than 97% ID match to multiple species that could co-occur within the sampling region were assigned to the taxonomic level that appropriately captured all potential matches (e.g., Cottus spp.). Any detections that could result from anthropogenic inputs (human, cat, dog, cow, chicken, and pig) were removed from analysis.
References:
Porter, T. M. & Hajibabaei, M. MetaWorks: A flexible, scalable bioinformatic pipeline for high-throughput multi-marker biodiversity assessments. PLOS ONE 17, e0274260 (2022).
NCBI Bioproject Accession: PRJNA1236377 ID: 1236377 ID 1236377 - BioProject - NCBI
