Data from: The core of the matter – Importance of identification method and biological replication for benthic marine monitoring
Data files
Nov 18, 2024 version files 24.16 GB
-
batchfileDADA2_SL1_5.list
140 B
-
batchfileDADA2_SL1_6.list
140 B
-
batchfileDADA2_SL1_7.list
140 B
-
batchfileDADA2_SL1_8.list
140 B
-
classified_curated.txt
257.90 KB
-
classified.txt
855.39 KB
-
DADA2_nochim.otus
24.99 MB
-
DADA2_nochim.table
17.76 MB
-
MD5_SL1_5.txt
156 B
-
MD5_SL1_6.txt
156 B
-
MD5_SL1_7.txt
156 B
-
MD5_SL1_8.txt
156 B
-
MSTmorf.sp.data.txt
25.50 KB
-
README.md
10.38 KB
-
SL1_5_FKDL192550852-1a_HHMVJDRXX_L1_1.fq.gz
2.46 GB
-
SL1_5_FKDL192550852-1a_HHMVJDRXX_L1_2.fq.gz
2.58 GB
-
SL1_6_FKDL192550853-1a_HHMVJDRXX_L1_1.fq.gz
3.35 GB
-
SL1_6_FKDL192550853-1a_HHMVJDRXX_L1_2.fq.gz
3.52 GB
-
SL1_7_FKDL192550854-1a_HHMVJDRXX_L1_1.fq.gz
2.86 GB
-
SL1_7_FKDL192550854-1a_HHMVJDRXX_L1_2.fq.gz
3.01 GB
-
SL1_8_FKDL192550855-1a_HHMVJDRXX_L1_1.fq.gz
3.09 GB
-
SL1_8_FKDL192550855-1a_HHMVJDRXX_L1_2.fq.gz
3.25 GB
-
summary.txt
2.10 MB
-
tags_SL1_5.txt
650 B
-
tags_SL1_6.txt
650 B
-
tags_SL1_7.txt
650 B
-
tags_SL1_8.txt
650 B
Abstract
Benthic macrofauna are important and widely used biological indicators of marine ecosystems as they have limited mobility and therefore integrate the effects of local environmental stressors over time. Recently, environmental DNA (eDNA) analysis has provided a potentially more resource-efficient approach for benthic biomonitoring than traditional morphology-based methods. Several studies have compared eDNA with morphology-based monitoring, but few have compared the two approaches using the exact same sediment cores. In addition, the meiofauna and pelagic organisms obtained as “bycatch” using eDNA have largely been disregarded from comparisons. Here, we address these shortcomings through comparative invertebrate analyses from six sediment sample replicates from each of four stations in Denmark, using eDNA metabarcoding and morphological identification. Our results revealed large variation between the six replicates for both methods and little overlap in taxon compositions between methods. While the morphological dataset was dominated by molluscs and annelids, the eDNA dataset was dominated by arthropods and annelids. Using community composition data, we found that sampling stations could be distinguished both with eDNA and with morphology. Finally, we evaluated expected total richness inferred from extrapolated accumulation curves of detected taxa from each method. This indicated that eDNA metabarcoding requires less replication than morphology for maximum coverage of diversity to be reached. However, both methods required high levels of replication, and our results on taxonomic composition add to the evidence that morphological and eDNA-based methods should preferably be used as complimentary tools for marine bioassessment.
https://doi.org/10.5061/dryad.mw6m9065b
Description of the data and file structure
The data consists of four sequencing libraries and associated demultiplexing files. Please see the suggested code below to make sense of the data files.
Files and descriptions
File: SL1_5_FKDL192550852-1a_HHMVJDRXX_L1_1.fq.gz
Description: File containing raw sequence data for library SL1_5 (R1) - PCR replicate number one.
File: SL1_5_FKDL192550852-1a_HHMVJDRXX_L1_2.fq.gz
Description: File containing raw sequence data for library SL1_5 (R2) - PCR replicate number one.
File: batchfileDADA2_SL1_5.list
Description: Batchfile associated with library SL1_5. This file is used for running MetaBarFlow (https://github.com/evaegelyng/MetaBarFlow). The file contains five tab separated elements (1: Name of R1 file, 2: Name of R2 file, 3: Forward primer, 4: Reverse primer, 5: The minimum length required for a read (unaligned, i.e. forward or reverse read) after trimming of primers and tags.
File: tags_SL1_5.txt
Description: Tag-file for library SL1_5 used for running MetaBarFlow. The file indicates sample names (column 1), and dual-unique indexing tags used for the molecular work (columns 2 and 3). Sampling sites are P23=Hjelm, 31S=Ven, HSD14=Samso and BFkar=KBM. In sample names, st1-6 refers to the station number within each sampling site. Note that the suffix “_1” indicates that this is the first PCR-replicate of the sample. CNE refers to “Control negative” (extraction blanks one and two), whereas NTC refers to PCR blanks.
File: MD5_SL1_5.txt
Description: MD5sum-file for checking size of library SL1_5 sequence files.
File: SL1_6_FKDL192550853-1a_HHMVJDRXX_L1_1.fq.gz
Description: File containing raw sequence data for library SL1_6 (R1) - PCR replicate number two.
File: SL1_6_FKDL192550853-1a_HHMVJDRXX_L1_2.fq.gz
Description: File containing raw sequence data for library SL1_6 (R2) - PCR replicate number two
File: batchfileDADA2_SL1_6.list
Description: Batchfile associated with library SL1_6. This file is used for running MetaBarFlow (https://github.com/evaegelyng/MetaBarFlow). The file contains five tab separated elements (1: Name of R1 file, 2: Name of R2 file, 3: Forward primer, 4: Reverse primer, 5: The minimum length required for a read (unaligned, i.e. forward or reverse read) after trimming of primers and tags.
File: tags_SL1_6.txt
Description: Tag-file for library SL1_6 used for running MetaBarFlow. The file indicates sample names (column 1), and dual-unique indexing tags used for the molecular work (columns 2 and 3). Sampling sites are P23=Hjelm, 31S=Ven, HSD14=Samso and BFkar=KBM. In sample names, st1-6 refers to the station number within each sampling site. Note that the suffix “_2” indicates that this is the second PCR-replicate of the sample. CNE refers to “Control negative” (extraction blanks one and two), whereas NTC refers to PCR blanks.
File: MD5_SL1_6.txt
Description: MD5sum-file for checking size of library SL1_6 sequence files.
File: SL1_7_FKDL192550854-1a_HHMVJDRXX_L1_1.fq.gz
Description: File containing raw sequence data for library SL1_7 (R1) - PCR replicate number three.
File: SL1_7_FKDL192550854-1a_HHMVJDRXX_L1_2.fq.gz
Description: File containing raw sequence data for library SL1_7 (R2) - PCR replicate number three.
File: batchfileDADA2_SL1_7.list
Description: Batchfile associated with library SL1_7. This file is used for running MetaBarFlow (https://github.com/evaegelyng/MetaBarFlow). The file contains five tab separated elements (1: Name of R1 file, 2: Name of R2 file, 3: Forward primer, 4: Reverse primer, 5: The minimum length required for a read (unaligned, i.e. forward or reverse read) after trimming of primers and tags.
File: tags_SL1_7.txt
Description: Tag-file for library SL1_7 used for running MetaBarFlow. The file indicates sample names (column 1), and dual-unique indexing tags used for the molecular work (columns 2 and 3). Sampling sites are P23=Hjelm, 31S=Ven, HSD14=Samso and BFkar=KBM. In sample names, st1-6 refers to the station number within each sampling site. Note that the suffix “_3” indicates that this is the third PCR-replicate of the sample. CNE refers to “Control negative” (extraction blanks one and two), whereas NTC refers to PCR blanks.
File: MD5_SL1_7.txt
Description: MD5sum-file for checking size of library SL1_7 sequence files.
File: SL1_8_FKDL192550855-1a_HHMVJDRXX_L1_1.fq.gz
Description: File containing raw sequence data for library SL1_8 (R1) - PCR replicate number four.
File: SL1_8_FKDL192550855-1a_HHMVJDRXX_L1_2.fq.gz
Description: File containing raw sequence data for library SL1_8 (R2) - PCR replicate number four.
File: batchfileDADA2_SL1_8.list
Description: Batchfile associated with library SL1_8. This file is used for running MetaBarFlow (https://github.com/evaegelyng/MetaBarFlow). The file contains five tab separated elements (1: Name of R1 file, 2: Name of R2 file, 3: Forward primer, 4: Reverse primer, 5: The minimum length required for a read (unaligned, i.e. forward or reverse read) after trimming of primers and tags.
File: tags_SL1_8.txt
Description: Tag-file for library SL1_8 used for running MetaBarFlow. The file indicates sample names (column 1), and dual-unique indexing tags used for the molecular work (columns 2 and 3). Sampling sites are P23=Hjelm, 31S=Ven, HSD14=Samso and BFkar=KBM. In sample names, st1-6 refers to the station number within each sampling site. Note that the suffix “_4” indicates that this is the fourth PCR-replicate of the sample. CNE refers to extraction blanks, whereas NTC refers to PCR blanks.
File: MD5_SL1_8.txt
Description: MD5sum-file for checking size of library SL1_5 sequence files.
File: DADA2_nochim.table
Description: This file is an output file from running MetaBarFlow, detailing which ASVs were found in what sample(s). Each cell contain read counts. Once again, note that P23=Hjelm, 31S=Ven, HSD14=Samso and BFkar=KBM. Sample name suffixes represent stations (stXX) and PCR replicate numbers (_X).
File: DADA2_nochim.otus
Description: This file is an output file, detailing ASVs retained from running the entire dataset through MetaBarFlow.
File: summary.txt
Description: This file is an output file from running MetaBarFlow, providing additional information on the matches from the combined nt+BOLD database search. It is mainly used for looking up what hits were responsible for any spurious identifications in the classified.txt file below. If a “qseqid” ID appears multiple times, this would mean multiple blast hits were found for this ASV. See MetaBarFlow (https://github.com/evaegelyng/MetaBarFlow) for more information.
File: classified.txt
Description: This file is an output file from running MetaBarFlow, showcasing the results of the automated taxonomic assignments. The last column “score.id” represents the taxonomic level to which the automated script would suggest a certain identification. Note that this file undergoes manual curation before being used as input for the analysis presented in the manuscript (the classified_curated.txt file below is the curated version).
File: classified_curated.txt
Description: Same as above - This is the curated version where manual edits have been made to correct wrongly assigned ASVs, and where non-metazoan taxa have been filtered out (according to the procedure described in the associated paper). Note that this file contains an extra column “final.id” with manually edited taxonomic assignments, which now overrules the “score.id”.
File: MSTmorf.sp.data.txt
Description: This file contains information about the species found with morphological inspection of the sediment samples. Each row represents a species (or taxon) found for a specific sample, and lists how many individuals were found, their wet weight (WW), wet weight of the sample specimens in total (TotalSampleWW), as well as the proportion (0-1) and percentages (0-100) of the sample biomass contributed by each taxon.
Usage notes
The uploaded raw data files are intended for running MetaBarFlow (https://github.com/evaegelyng/MetaBarFlow) on a high-performance computing cluster. Raw sequence data, tag files for demultiplexing and the batchfile containing primer information are all necessary to make sense of the data.
On an HPC, I suggest to put all files in the same folder and run the following code to have all data in the correct folder-format:
mkdir SL1_5 SL1_6 SL1_7 SL1_8
mv SL1_5_FKDL192550852-1a_HHMVJDRXX_L1_1.fq.gz SL1_5_FKDL192550852-1a_HHMVJDRXX_L1_2.fq.gz SL1_5/
mv batchfileDADA2_SL1_5.list SL1_5/batchfileDADA2.list
mv tags_SL1_5.txt SL1_5/tags.txt
mv MD5_SL1_5.txt SL1_5/MD5.txt
mv SL1_6_FKDL192550853-1a_HHMVJDRXX_L1_1.fq.gz SL1_6_FKDL192550853-1a_HHMVJDRXX_L1_2.fq.gz SL1_6/
mv batchfileDADA2_SL1_6.list SL1_6/batchfileDADA2.list
mv tags_SL1_6.txt SL1_6/tags.txt
mv MD5_SL1_6.txt SL1_6/MD5.txt
mv SL1_7_FKDL192550854-1a_HHMVJDRXX_L1_1.fq.gz SL1_7_FKDL192550854-1a_HHMVJDRXX_L1_2.fq.gz SL1_7/
mv batchfileDADA2_SL1_7.list SL1_7/batchfileDADA2.list
mv tags_SL1_7.txt SL1_7/tags.txt
mv MD5_SL1_7.txt SL1_7/MD5.txt
mv SL1_8_FKDL192550855-1a_HHMVJDRXX_L1_1.fq.gz SL1_8_FKDL192550855-1a_HHMVJDRXX_L1_2.fq.gz SL1_8/
mv batchfileDADA2_SL1_8.list SL1_8/batchfileDADA2.list
mv tags_SL1_8.txt SL1_8/tags.txt
mv MD5_SL1_8.txt SL1_8/MD5.txt
Now there should be four folders, one for each library (PCR replicate), and each containing the data needed for processing the raw data. If you want to follow MetaBarFlow for processing files, remember to unzip the sequence files before running (e.g. using gunzip).
This dataset represents environmental DNA metabarcoding data from sediment samples from Danish waters (collected in March/April 2019). Four sites were sampled (Samso, Hjelm, Ven and Karrebaeksminde (KBM)), and six samples were collected at each site (24 samples total, excluding controls). These eDNA samples were collected in connection with a morphological survey of benthic macrofauna carried out by the Danish Environmental Protection Agency. See connected publication for additional details.
DNA has been amplified using the forward primer mlCOIintF-XT (5′-GGWACWRGWTGRACWITITAYCCYCC-3′) and the reverse primer jgHCO2198 (5′-TAIACYTCIGGRTGICCRAARAAYCA-3′), which together amplify ~313 bp of the mitochondrial COI gene. The four libraries sequenced (SF1_5 - SF1_8) represent the four PCR replicates carried out in this study. The libraries were sequenced using paired-end NovaSeq 6000 sequencing (250 bp PE). Tags are consistent across libraries.