Data from: ITS all right mama: Investigating the formation of chimeric sequences in the ITS2 region by DNA metabarcoding analyses of fungal mock communities of different complexities
Bjørnsgaard Aas, Anders; Davey, Marie Louise; Kauserud, Håvard (2016), Data from: ITS all right mama: Investigating the formation of chimeric sequences in the ITS2 region by DNA metabarcoding analyses of fungal mock communities of different complexities, Dryad, Dataset, https://doi.org/10.5061/dryad.n4cv3
The formation of chimeric sequences can create significant methodological bias in PCR-based DNA metabarcoding analyses. During mixed-template amplification of barcoding regions, chimera formation is frequent and well documented. However, profiling of fungal communities typically uses the more variable rDNA region ITS. Due to a larger research community, tools for chimera detection have been developed mainly for the 16S/18S markers. However, these tools are widely applied to the ITS region without verification of their performance. We examined the rate of chimera formation during amplification and 454 sequencing of the ITS2 region from fungal mock communities of different complexities. We evaluated the chimera detecting ability of two common chimera-checking algorithms: Perseus and UCHIME. Large proportions of the chimeras reported were false positives. No false negatives were found in the dataset. Verified chimeras accounted for only 0.2% of the total ITS2 reads, which is considerably less than what is typically reported in 16S and 18S metabarcoding analyses. Verified chimeric "parent sequences" had significantly higher percent identity to one another than to random members of the mock communities. Community complexity increased the rate of chimera formation. GC content was higher around the verified chimeric break points, potentially facilitating chimera formation through base pair mismatching in the neighboring regions of high similarity in the chimeric region. We conclude that the hypervariable nature of the ITS region seem to buffer the rate of chimera formation in comparison to other, less variable barcoding regions, due to shorter regions of high sequence similarity.