Skip to main content
Dryad

Data from: Towards eradicating the nuisance of Numts and noise in molecular biodiversity assessment

Cite this dataset

Graham, Natalie (2021). Data from: Towards eradicating the nuisance of Numts and noise in molecular biodiversity assessment [Dataset]. Dryad. https://doi.org/10.6078/D1D13G

Abstract

DNA metabarcoding is a popular methodology for biodiversity assessment and increasingly used for community level analysis of intraspecific genetic diversity. The evolutionary history of hundreds of specimens can be captured in a single collection vial. However, the method is not without pitfalls, which may inflate or misrepresent recovered diversity metrics. Numts, nuclear pseudogene copies of mitochondrial DNA, have been particularly difficult to control because they can evolve rapidly and appear deceptively similar to true mitochondrial sequences. While the problem of numts has long been recognized for traditional sequencing approaches, the issues they create are particularly evident in metabarcoding in which the identity of individual specimens is generally not known. In this issue of Molecular Ecology Resources Andújar et al. (2021) provide an easy to implement bioinformatic approach to reduce erroneous sequences due to numts and residual noise in metabarcoding datasets. The metaMATE software designates input sequences as authentic (mtDNA haplotypes) or non-authentic (numts and erroneous sequences) by comparison to reference data and by analyzing nucleotide substitution patterns. Filtering is applied over a range of abundance thresholds and the choice to proceed with a more rigid or less strict sequence removal strategy is at the researchers’ discretion. This is a valuable addition to a growing number of complementary tools for improving the reliability of modern biodiversity monitoring.

Methods

Fasta file of denoised sequences (ZOTUs) from the UNOISE algorithm after quality filtering steps. Used for input for meteMATE software tool for Numt removal.