Skip to main content
Dryad

eDNAssay: a machine learning tool that accurately predicts qPCR cross-amplification

Cite this dataset

Kronenberger, John et al. (2022). eDNAssay: a machine learning tool that accurately predicts qPCR cross-amplification [Dataset]. Dryad. https://doi.org/10.5061/dryad.cnp5hqc74

Abstract

Environmental DNA (eDNA) sampling is a highly sensitive and cost-effective technique for wildlife monitoring, notably through the use of qPCR assays. However, it can be difficult to ensure assay specificity when many closely related species cooccur. In theory, specificity may be assessed in silico by determining whether assay oligonucleotides have enough base-pair mismatches with nontarget sequences to preclude amplification. However, the mismatch qualities required are poorly understood, making in silico assessments difficult and often necessitating extensive in vitro testing—typically the greatest bottleneck in assay development. Increasing the accuracy of in silico assessments would therefore streamline the assay development process. In this study, we paired 10 qPCR assays with 82 synthetic gene fragments for 530 specificity tests using SYBR Green intercalating dye (n = 262) and TaqMan hydrolysis probes (n = 268). Test results were used to train random forest classifiers to predict amplification. The primer-only model (SYBR Green-based) and full-assay model (TaqMan probe-based) were 99.6% and 100% accurate, respectively, in cross-validation. We further assessed model performance using six independent assays not used in model training. In these tests the primer-only model was 92.4% accurate (n = 119) and the full-assay model was 96.5% accurate (n = 144). The high performance achieved by these models makes it possible for eDNA practitioners to more quickly and confidently develop assays specific to the intended target. Practitioners can access the full-assay model via eDNAssay (https://NationalGenomicsCenter.shinyapps.io/eDNAssay), a user-friendly online tool for predicting qPCR cross-amplification.

Methods

This dataset combines information on assay-template mismatch characteristics and the results of SYBR Green and TaqMan probe-based qPCR tests. These parameters can be used to train random forest classifiers to predict qPCR cross-amplification.

Usage notes

Usage of these data requires Microsoft Excel (or similar), MEGA (or similar), and R software.

Funding

United States Department of Defense, Award: RC21-5121