Data from: Optimization of wetland environmental DNA metabarcoding protocols for Great Lakes region herpetofauna
Data files
Jan 07, 2025 version files 4.38 GB
-
alignment_taxonomy.zip
44.17 KB
-
analysis_data.zip
20.46 KB
-
community_matrices.zip
5.31 KB
-
mothur_files.zip
4.62 KB
-
post_seq_files.zip
4.38 GB
-
README.md
7.10 KB
Abstract
Many species of reptiles and amphibians (herpetofauna) rely on wetlands that are being degraded and lost at a high rate. Characterization of herpetofauna diversity in different wetland types may help guide conservation strategies. However, traditional survey methods often involve sampling within small temporal windows and gear deployed may be taxonomically biased, thus, they may fail to accurately characterize species presence/absence and diversity. In contrast, environmental (e)DNA metabarcoding has been shown to effectively survey entire aquatic communities and can provide a useful complement to traditional surveys. The objective of this study was to design and optimize eDNA sampling and laboratory protocols for wetland herpetofauna. Protocols evaluated included different water sampling approaches (point versus transect sampling), seasonality of sampling, and choice of metabarcoding marker (mitochondrial 12S versus 16S rDNA). Samples collected from 10 sites across southern Michigan detected 17 amphibian and five reptile species, including four species of conservation concern (Ambystoma texanum, Clemmys guttata, Rana palustris, and Sternotherus odoratus). We observed no difference in the number of species detected between point and transect samples (p = 0.70), but point sampling required less time (p = 0.03) and allowed significantly larger volumes of water to be filtered (p = 1.13e-5). No difference in species richness was observed between the 12S and 16S mitochondrial DNA markers (p = 0.96). However, a greater number of taxa were identifiable at the species level when using the 16S locus. There was also a significant difference in the number of species detected between early and late summer sampling periods (more species detected in the earlier period; p = 6.31e-6), and some species were only found in the early or late sampling period. We recommend sampling during multiple periods to fully characterize species composition, the use of point sampling, and the 16S mtDNA marker for herpetofauna eDNA metabarcoding studies.
README: Data from: Optimization of wetland environmental DNA metabarcoding protocols for Great Lakes region herpetofauna
https://doi.org/10.5061/dryad.8w9ghx3wv
This repository contains analysis scripts and input data files associated with the eDNA metabarcoding of herpetofauna communities in Michigan wetlands and subsequent analyses for “Optimization of wetland environmental DNA metabarcoding protocols for Great Lakes region herpetofauna”.
Description of the data and file structure
Alignment and taxonomy files for 279 species (12S) and 305 species (16S) are in the alignment_taxonomy.zip
file.
Vertebrate_12S_align_072821_NoPrimers.fas
andVertebrate_16S_align_111722_NoPrimers.fas
contain sequence alignments for species with accession numbers (if available) and scientific names. If produced by our lab, there is no accession number, but a four-letter code (first two letters of genus and specific epithet) and the sample number.Vertebrate_12S_rDNA_taxonomy_072821.txt
andVertebrate_16S_rDNA_taxonomy_111722.txt
contain accession or lab-produced numbers and the full taxonomic classifications for each species.
Cleaned community matrices with target herpetofauna species as .csv
files are in the community_matrices.zip
file. Included scripts to create community matrices are described below in the “Code/Software” section.
clean_community_matrix_12S_y1.csv
/clean_community_matrix_16S_y1.csv
- a data file containing read counts for herpetofauna species detected in samples for the 12S or 16S markers.- Each file contains the following columns:
- sample- sample name with the following structure:
- Example: 16S_JC2_T5.16S
- 16S, the marker
- C2, Jackson county site 2
- T, transect sample
- 5, fifth transect sample
- 16S, marker
- OTU (operational taxonomic unit) read counts (e.g. Rana_clamitans) for each sample in the data set
- sample- sample name with the following structure:
Files used for the Mothur (Schloss, 2009) analysis are included in the mothur_files.zip
file.
MSU_12S_eDNA_oligos_file.txt
/MSU_16S_eDNA_oligos_file.txt
- a text file containing barcodes required for each marker analysis. These files contain a sample column in the following structure:- 12S/16S- column labeled with the marker
- Example: 2BR1_T1
- 2, the second sampling period
- BR1, the first Barry county site
- T, transect sample
- 1, first transect sample
- EN or Neg are extraction or PCR negatives
- 12S/16S- column labeled with the marker
Output files produced from sequencing can be found in the post_seq_files.zip
file. This directory contains the following four gzipped fastq files:
OR_2021_Herp_P_Lib_S1_L001_R1_001.fastq.gz
andOR_2021_Herp_P_Lib_S1_L001_R2_001.fastq.gz
- One file for read one and one file for read two.OR_2021_Herp_P_Lib_S1_L001_I2_001.fastq.gz
andOR_2021_Herp_P_Lib_S1_L001_I1_001.fastq.gz
- One file for forward indices and one file for reverse indices.
Data required for analyses of our data set are included in the analysis_data.zip
file. This directory contains 10 .csv
files.
- The file
full_herp_data.csv
is the primary file to be used for most analyses. It contains the following columns:- sample- sample name with the following structure:
- Example: 2BR1_T1
- 2, the second sampling period
- BR1, the first Barry county site
- T, transect sample
- 1, first transect sample
- type- sampling method, point or transect
- vol- amount of water collected for a sample
- season- sampling period (Period 1 or Period 2/spring or summer)
- wetland- wetland type sampled (m = marsh, vp = vernal pool, f = fen)
- duration- the amount of time in second it took to filter the record volume of water for a sample
- no_spp- the number of species detected in that sample
- sample- sample name with the following structure:
- The file
specpool_env.csv
is a truncated version of the file above, including only the sample, sample type, sampling period, and wetland type. - The following files have column names with a genus and specific epithet included (Example: Rana_clamitans):
finalspec.csv
,forshan24.csv
,fullspec_herp.csv
andspecpoolexp.csv
. - The files
mcnumbers12S.csv
andmcnumbers16S.csv
contain expected and observed read proportions and species names assessed for mock communities and marker correlation analyses. - The file
nmds_new.csv
contains the following columns:- sample- sample name with the following structure:
- Example: 2BR1_T1
- 2, the second sampling period
- BR1, the first Barry county site
- T, transect sample
- 1, first transect sample
- period- sampling period (Period 1 or Period 2/spring or summer)
- county- name of sampling site county
- sampletype- type of sampling method (point or transect)
- OTU (operational taxonomic unit) relative read counts (e.g. Rana_clamitans) for each sample in the data set
- sample- sample name with the following structure:
- The file
shannonenv.csv
is a list of county names for Shannon diversity analyses. - Site abbreviations for all above files are as follows:
- CLI- Clinton county
- BR1- Barry county, site 1
- BR2- Barry county, site 2
- KZ1- Kalamazoo county, site 1
- KZ2- Kalamazoo county, site 2
- LIV- Livingston county
- OAK- Oakland county
- JC1- Jackson county, site 1
- JC2- Jackson county, site 2
- HIL- Hillsdale county
Code/Software
Bioinformatic processing scripts can be found in the mothur_scripts
/ subdirectory of data_processing.zip
. This includes five scripts sorted by mitochondrial marker (12S or 16S).
Herp_12S_batch_mothur_script_OMR.sh
/Herp_16S_batch_mothur_script_OMR.sh
- a script of commands that run through the Mothur pipeline on a high-performance computing cluster.submission_12S_herp.sh
/submission_16S_herp.sh
- a script for submission to call the executable batch Mothur script described above.MakeContigs_12S_TEMPLATE.sh
- a template script for themake.contigs
step in the Mothur pipeline.- Required oligos files are in the
mothur_files/
directory described above.
Additionally, R code to create community matrices can be found in the otu2cm/
subdirectory of data_processing.zip
file. This directory contains the following two R scripts.
step1_OTU_classification_summary_herps.R
- A script to evaluate the OTU classifications made by the Mothur pipeline and write processed taxonomic groups to an out file.step2_OTU2_condensed_community_matrix_JR_FAST_herps.R
- A script to assess OTU classifications from Mothur and create community matrices.
R code for analyses and visualization associated with the manuscript are in the analyses/
directory of the analyses.zip
file.
Analyses_Plots_HerpeDNA.R
Analyses included in this script: ANOVAs, PERMANOVAs, NMDS plots, boxplots, paired t-tests, mock community plots, and correlation tests
All required data to reproduce analyses and plots in this script are in the
analysis_data/
directory described above.