How, what, and where you sample environmental DNA affects diversity estimates and species detection

Published Aug 27, 2024 on Dryad. https://doi.org/10.5061/dryad.bzkh189jt

Data files

Aug 27, 2024 version files 1.68 MB

literature_review_archive.xlsx

55.37 KB
p716_run221129_COI_ZOTU_c97_Count_Sintax.txt

1.47 MB
p716_run221129_ITS2_ZOTU_c97_Count_Sintax.txt

139.71 KB
p716_sampleID_renamed.txt

9.37 KB
README.md

7.03 KB

Abstract

Environmental DNA (eDNA) is a complex mixture of DNA, varying in particle sizes and distributed patchily in aquatic systems. Optimizing eDNA sampling is crucial for maximizing species detection, particularly in high-risk scenarios like invasive species management. This study compares two eDNA capture methods (grab sample vs. tow net) at the opposite ends of the spectrum for volume and particle size to ascertain what most influences 1) the detection of invasive species (Dreissenid mussels and Burmese pythons) and 2) total diversity monitoring of metazoan, fungi, and protist with COI marker and plant communities with an ITS2 marker. Sampling was conducted across a wide geography and diverse aquatic environments in Minnesota and Florida, USA and Switzerland. The tow net samples were consistently higher in eDNA yield compared to grab samples; however, they exhibited equal or lower alpha diversity of ZOTUs (operational taxonomic units), challenging an assumption that there is a relationship between eDNA yield and diversity. We found biodiversity patterns were significantly influenced by the capture method, especially for COI communitymetazoan diversity in all three regions. The two capture methods measured different beta diversity of COImetazoan communities across all three regionlocations, highlighting the confounding impact of sample volume and filtration pore size on the diversity of eDNA captured. Interestingly, the beta diversity of plant eDNA was less impacted by the capture method compared to themetazoan beta diversity of COI communities. We found no clear difference in detection for the two invasive species targets with respect to the eDNA capture method. These results underscore the need for pilot studies before conducting biodiversity inventory and monitoring, and a need for a greater understanding of not just how much, but also what eDNA is captured depending on method choice, considering both spatial and particle size heterogeneity. For biodiversity inventories, further research is needed to understand what causes differences in detection, and what are the optimal eDNA sampling strategies.

https://doi.org/10.5061/dryad.bzkh189jt

Description of the data and file structure

`1) Literature_review _archive

Data Description

The dataset includes the following columns:

Authors: The authors of the study. This column contains the names of the authors in a string format.
Title: The title of the study. This column is in string format.
Year: The year the study was published. This column is in string format.
Source_title: The title of the source where the study was published. This column is in string format.
DOI: The Digital Object Identifier for the study. This column is in string format.
Taxa: The taxa involved in the study. This column is in string format.
System: The system (marine or freshwater) studied. This column is in string format.
Volume: The volume of the source in which the study was published. This column is in numeric format.
Pore: The pore size used in the study. This column is in numeric format.
Filter_type: The type of filter used in the study. This column is in string format.

Variable Definitions

Authors: Full names of the authors, separated by commas if there are multiple authors.
Title: The full title of the publication.
Year: The year the publication was released.
Source_title: The journal or source title where the publication appeared.
DOI: The unique identifier for the publication, useful for locating the study online.
Taxa: Specific taxa involved in the research (e.g., marine, freshwater).
System: The environmental system studied, categorized into 'marine' and 'freshwater'.
Volume: The volume of water filtered (ml).
Pore: The pore size of filters used in the study, measured in micrometers.
Filter_type: The type of filter used, such as 'membrane'.

Instructions for Use

Accessing the Data: The data is provided in an Excel file format, which can be opened using any standard spreadsheet software such as Microsoft Excel, Google Sheets, or LibreOffice Calc.
Data Handling: Users can filter, sort, and analyze the data as needed. Missing data has been marked as NA.
Citation: If you use this dataset in your research or publications, please cite it appropriately and include a reference to the original study.

Abbreviations and Units

DOI: Digital Object Identifier
NA: Not Available or Not Applicable (used for missing data)
Pore: Measured in micrometers (µm)
Volume (ml)

2) ZOTU table with taxonomic assignment (COI)

File Overview

File Name: p716_run221129_COI_ZOTU_c97_Count_Sintax.txt

File Format:.txt

Data Description

The dataset includes the following structure:

Rows (Taxa): Each row represents a different taxon identified by a unique ZOTU identifier.
Columns (Samples): Each column represents a different sample, identified by a unique sample ID.

Variable Definitions

Taxa (Rows)

ZOTU Identifier: A unique identifier for each taxon, e.g., ZOTU3353, ZOTU7041.

Samples (Columns)

Sample IDs: Unique identifiers for each sample, e.g., FFF01, FFF02, FFF03, etc.
Abundance Values: The count of sequences corresponding to each taxon in each sample.
Last seven columns represented the taxonomic identity for each rank (kingdom, phylum, class, order, family, genus, species)

Abbreviations and Units

ZOTU: Zero-radius Operational Taxonomic Unit
NA: Not Available or Not Applicable (used for missing data)

3) ZOTU table with taxonomic assignment (ITS2)

File Overview

File Name: p716_run221129_ITS2_ZOTU_c97_Count_Sintax.txt

File Format:.txt

Data Description

The dataset includes the following structure:

Rows (Taxa): Each row represents a different taxon identified by a unique ZOTU identifier.
Columns (Samples): Each column represents a different sample, identified by a unique sample ID.

Variable Definitions

Taxa (Rows)

ZOTU Identifier: A unique identifier for each taxon, e.g., ZOTU3353, ZOTU7041.

Samples (Columns)

Sample IDs: Unique identifiers for each sample, e.g., FFF01, FFF02, FFF03, etc.
Abundance Values: The count of sequences corresponding to each taxon in each sample.
Last seven columns represented the taxonomic identity for each rank (kingdom, phylum, class, order, family, genus, species)

Abbreviations and Units

ZOTU: Zero-radius Operational Taxonomic Unit
NA: Not Available or Not Applicable (used for missing data)

4) Metadata file

File Overview

File Name: p716_sampleID_renamed.txt
File Format: Tab-delimited text file (.txt)

Data Description

The dataset includes the following columns:

e_name: External name of the sample.
code_name: Internal code name of the sample.
Well_position: The well position of the sample in a microplate.
Sample.ID: The sample identification string.
Site: The site location where the sample was collected.
Country: The country where the sample site is located.
Type: filer or two net collection method of the sample
ITS_c: ITS gene concentration measurement
COI_c: COI gene concentration measurement.
NTC: Negative Test Control (NTC) status.

Variable Definitions

e_name: A unique external name assigned to each sample.
code_name: An internal code name for each sample.
Well_position: Indicates the position of the sample in the well plate, e.g., A10, B9.
Sample.ID: A unique identifier for each sample, combining site information and sample details.
Site: Describes the specific location or site where the sample was collected.
Country: The country where the sampling site is located.
Type: The type of sample, indicating sample collection strategy (e.g., Filter, tow net).
ITS_c: The concentration of the ITS gene in the sample, measured in some standard unit.
COI_c: The concentration of the COI gene in the sample, measured in some standard unit..
NTC: Indicates the Negative Test Control status of the sample, identifying if the sample is positive or negative.

Abbreviations and Units

e_name: External name
ITS_c: Internal Transcribed Spacer gene concentration
COI_c: Cytochrome c oxidase I gene concentration
NTC: Negative Test Control
NA: Not Available or Not Applicable (used for missing data)

Sharing/Access information

The raw reads for COI and ITS2 assays are available on NCBI Short Read Archive ( BioProject: PRJNA1090309)

Code/Software

1) p716_run221129_ITS2.report and p716_run221129_COI.report are files containing code used for bioinformatics for ITS and COI sequences dataset respectively.

2) R script_data analysis: R script to run data analysis

How, what, and where you sample environmental DNA affects diversity estimates and species detection

Data files

Abstract

README: How, what, and where you sample environmental DNA affects diversity estimates and species detection

Description of the data and file structure

Data Description

Variable Definitions

Instructions for Use

Abbreviations and Units

2) ZOTU table with taxonomic assignment (COI)

File Overview

Data Description

Variable Definitions

Taxa (Rows)

Samples (Columns)

Abbreviations and Units

3) ZOTU table with taxonomic assignment (ITS2)

File Overview

Data Description

Variable Definitions

Taxa (Rows)

Samples (Columns)

Abbreviations and Units

4) Metadata file

File Overview

Data Description

Variable Definitions

Abbreviations and Units

Sharing/Access information

Code/Software

Works referencing this dataset