Metabarcoding of environmental DNA (eDNA) is a powerful tool for describing biodiversity, such as finding keystone species or detecting invasive species in environmental samples. Continuous improvements in the method and the advances in sequencing platforms over the last decade have meant this approach is now widely used in biodiversity sciences and biomonitoring. For its general use, the method hinges on a correct identification of taxa. However, past studies have shown how this crucially depends on important decisions during sampling, sample processing, and subsequent handling of sequencing data. With no clear consensus as to the best practice, particularly the latter has led to varied bioinformatic approaches and recommendations for data preparation and taxonomic identification. In this study, using a large freshwater fish eDNA sequence dataset, we compared the frequently used zero-radius Operational Taxonomic Unit (zOTUs) approach of our raw reads and assigned it taxonomically i) in combination with publicly available reference sequences (open databases) or ii) with an OSU (Operational Sequence Units) database approach, using a curated database of reference sequences generated from specimen barcoding (closed database). We show both approaches gave comparable results for common species. However, the commonalities between the approaches decreased with read abundance and were thus less reliable and not comparable for rare species. The success of the zOTU approach depended on the suitability, rather than the size, of a reference database. Contrastingly, the OSU approach used reliable DNA sequences and thus often enabled species-level identifications, yet this resolution decreased with the recent phylogenetic age of the species. We show the need to include target group coverage, outgroups and full taxonomic annotation in reference databases to avoid misleading annotations that can occur when using short amplicon sizes as commonly used in eDNA metabarcoding studies. Finally, we make general suggestions to improve the construction and use of reference databases for metabarcoding studies in the future.

The data was collected under the framework of the federal water quality assessment in Switzerland. The data is generated from eDNA samples in Swiss rivers that are routinely surveyed. In spring 2019, 92 sites were sampled for eDNA with 4 replicates for each site. The eDNA filters were then extracted in a clean room. From these extracts, we used a nested PCR using the 12S Mifish primers to create an amplicon library, that was paired end sequenced. The 12S barcode is focused on the detection of fish communities. The libraries were prepared in-house and at the Genetic Diversity Center (user labs) at ETH Zurich.

The folder contains the eDNA metabarcoded amplicon paired-end sequences as raw data set from the Miseq sequencer. Once unzipped, all raw sequencing files are available as fasta files and can be opened with a text editor or used for further downstream bioinformatic workflow.

General principles for assignments of communities from eDNA: Open versus closed taxonomic databases

Data files

Abstract

General principles for assignments of communities from eDNA: Open versus closed taxonomic databases

Data files

Abstract

Methods

Usage notes

Works referencing this dataset