Skip to main content
Dryad

Do pseudogenes pose a problem for metabarcoding marine animal communities?

Data files

Jun 13, 2022 version files 1.09 MB

Abstract

Because DNA metabarcoding typically employs sequence diversity among mitochondrial amplicons to estimate species composition, nuclear mitochondrial pseudogenes (NUMTs) can inflate diversity. This study quantifies the incidence and attributes of NUMTs derived from the 658 bp barcode region of cytochrome c oxidase I (COI) in 156 marine animal genomes. NUMTs were examined to ascertain if they could be recognized by their possession of indels or stop codons. In total, 309 NUMTs  150 bp were detected, with an average of 1.98 per species (range = 0–33) and a mean length of 391 bp  200 bp. Among this total, 75 (23.4%) lacked indels or stop codons. NUMTs appear to pose the greatest interpretational risk when short (< 313 bp) amplicons are used, such as in eDNA studies, dietary analyses, or processed fish identification. Employing the standard amplicon length (313 bp) for marine metabarcoding, NUMTs could potentially inflate the OTU count by 21% above the true species count while also raising intraspecific variation at COI by 15%. However, when both amplicon length and position are considered, inflation in OTU counts and in barcode variation were just 9% and 10%, respectively, suggesting NUMTs will not seriously distort biodiversity assessments. There was a weak positive correlation between genome size and NUMT count but no variation among phyla or trophic groups. Until bioinformatic advances improve NUMT detection, the best defense involves targeting long amplicons and developing reference databases that include both mitochondrial sequences and their NUMT derivatives.