eDNAssay: a machine learning tool that accurately predicts qPCR cross-amplification

Kronenberger, John 1 ; Wilcox, Taylor1; Mason, Daniel1; Franklin, Thomas1; McKelvey, Kevin1; Young, Michael1; Schwartz, Michael1

Research facility: National Genomics Center for Wildlife and Fish Conservation

Published Jul 01, 2022 on Dryad. https://doi.org/10.5061/dryad.cnp5hqc74

Data files

Jul 01, 2022 version files 175.27 KB

eDNAssay_alignment_example.fas

19.58 KB
eDNAssay_metadata_example.csv

4.29 KB
eDNAssay_metadata.csv

4.46 KB
README.txt

6.12 KB
SYBR_testing_data.csv

7.06 KB
SYBR_training_data.csv

56.41 KB
TaqMan_testing_data.csv

8.57 KB
TaqMan_training_data.csv

68.77 KB

Abstract

Environmental DNA (eDNA) sampling is a highly sensitive and cost-effective technique for wildlife monitoring, notably through the use of qPCR assays. However, it can be difficult to ensure assay specificity when many closely related species cooccur. In theory, specificity may be assessed in silico by determining whether assay oligonucleotides have enough base-pair mismatches with nontarget sequences to preclude amplification. However, the mismatch qualities required are poorly understood, making in silico assessments difficult and often necessitating extensive in vitro testing—typically the greatest bottleneck in assay development. Increasing the accuracy of in silico assessments would therefore streamline the assay development process. In this study, we paired 10 qPCR assays with 82 synthetic gene fragments for 530 specificity tests using SYBR Green intercalating dye (n = 262) and TaqMan hydrolysis probes (n = 268). Test results were used to train random forest classifiers to predict amplification. The primer-only model (SYBR Green-based) and full-assay model (TaqMan probe-based) were 99.6% and 100% accurate, respectively, in cross-validation. We further assessed model performance using six independent assays not used in model training. In these tests the primer-only model was 92.4% accurate (n = 119) and the full-assay model was 96.5% accurate (n = 144). The high performance achieved by these models makes it possible for eDNA practitioners to more quickly and confidently develop assays specific to the intended target. Practitioners can access the full-assay model via eDNAssay (https://NationalGenomicsCenter.shinyapps.io/eDNAssay), a user-friendly online tool for predicting qPCR cross-amplification.

eDNAssay: a machine learning tool that accurately predicts qPCR cross-amplification

Data files

Abstract

Methods

Usage notes

Works referencing this dataset