Skip to main content
Dryad

Environmental niche models improve species identification in DNA barcoding

Data files

Abstract

Recent advances in DNA barcoding have immeasurably advanced global biodiversity research in the last two decades. However, inherent limitations in barcode sequences, such as hybridization, introgression, or incomplete lineage sorting can lead to misidentifications when relying solely on barcode sequences. Here, we propose a new Niche-model-Based Species Identification (NBSI) method based on the idea that species distribution information is a potential complement to DNA barcoding species identifications. NBSI performs species membership inference by incorporating niche modeling predictions and traditional DNA barcoding identifications. Systematic tests across diverse scenarios show significant improvements in species identification success rates under the newly proposed NBSI framework, where the largest increase is from 4.7% (95%CI: 3.51%-6.25%) to 94.8% (95%CI: 93.19%-96.06%). Additionally, obvious improvements were observed when using NBSI on potentially ambiguous sequences whose genetic nearest neighbors belong to another species or more than two species, commonly occurring with species represented by single or short DNA barcodes. These results support our assertion that environmental factors/variables are valuable complements to DNA sequence data for species identification by avoiding potential mis-identifications inferred from genetic information alone. The NBSI framework is currently implemented as a new R package, “NicheBarcoding”, that is open source under GNU General Public License and freely available from https://CRAN.R-project.org/package=NicheBarcoding.