The topological nature of tag jumping in environmental DNA metabarcoding studies (sequencing raw data)
Data files
Nov 29, 2022 version files 5.85 GB
-
README.md
715 B
-
Sequencing_raw_data.zip
5.85 GB
Abstract
Metabarcoding of environmental DNA constitutes a state-of-the-art tool for environmental studies. One fundamental principle implicit in most metabarcoding studies is that individual sample amplicons can still be identified after being pooled with others – based on their unique combinations of tags – during the so-called demultiplexing step that follows sequencing. Nevertheless, it has been recognized that tags can sometimes be changed (i.e. tag jumping), which ultimately leads to sample crosstalk. Here, using four DNA metabarcoding datasets derived from the analysis of soils and sediments, we show that tag jumping follows very specific and systematic patterns. Specifically, we find a strong correlation between the number of reads in blank samples and their topological position in the tag matrix (described by vertical and horizontal vectors). This observed spatial pattern of artefactual sequences could be explained by polymerase activity, which leads to the exchange of the 3’ tag of single stranded tagged sequences through the formation of heteroduplexes with mixed barcodes. Importantly, tag jumping substantially distorted our datasets – despite our use of methods suggested to minimize this error. We developed a topologic model to estimate the noise based on the counts in our blanks, which suggested that 40-80% of the taxa in our soil and sedimentary samples were likely false positives introduced through tag jumping. We highlight that the amount of false positive detections caused by tag jumping strongly biased our community analyses.
Usage notes
Sequencing data in fastq format. Multiplexed libraries, information on tags, primers and protocols can be found in the published paper and its supplementary information.