Skip to main content
Dryad

Data from: Anacapa Toolkit: an environmental DNA toolkit for processing multilocus metabarcode datasets

Cite this dataset

Curd, Emily E. et al. (2019). Data from: Anacapa Toolkit: an environmental DNA toolkit for processing multilocus metabarcode datasets [Dataset]. Dryad. https://doi.org/10.5061/dryad.mf0126f

Abstract

1. Environmental DNA (eDNA) metabarcoding is a promising method to monitor species and community diversity that is rapid, affordable, and non-invasive. Longstanding needs of the eDNA community are modular informatics tools, comprehensive and customizable reference databases, flexibility across high-throughput sequencing platforms, fast multilocus metabarcode processing, and accurate taxonomic assignment. As bioinformatics tools continue to improve, addressing each of these demands within a single bioinformatics toolkit is becoming a reality. 2. Here we present an open access modular metabarcode sequence toolkit, Anacapa (https://github.com/limey-bean/Anacapa/), that addresses the above needs, allowing users to build comprehensive reference databases and process raw multilocus metabarcode sequence data to accurately characterize communities. A novel aspect of Anacapa is our database builder, Creating Reference libraries Using eXisting tools (CRUX), that generates comprehensive reference databases for specific user-defined metabarcode loci. The Quality Control and Dereplication module sorts and processes multiple metabarcode loci and processe merged, unmerged and unpaired reads maximizing recovered diversity. Next DADA2 detects amplicon sequence variants (ASVs) and the Anacapa Classifier module aligns these ASVs to CRUX-generated reference databases using Bowtie2. Taxonomy is assigned to ASVs with confidence scores using a Bayesian Lowest Common Ancestor (BLCA) method. The Anacapa toolkit also includes an R package, ranacapa, for automated results exploration through standard biodiversity statistical analysis. 3. Comparative tests to other published reference databases show that CRUX generates broad, comprehensive reference databases that capture more taxonomic diversity. A variety of benchmarking approaches show that the Anacapa Classifier module’s Bowtie2-BLCA assigns robust, high-quality taxonomy to both MiSeq and HiSeq-length eDNA metabarcode sequences. We further demonstrate the utility of the Anacapa Toolkit by assigning taxonomy to eDNA sequences from terrestrial and marine samples from southern California through CaleDNA (http://www.ucedna.com/). 4. The Anacapa Toolkit broadens the exploration of eDNA and assists in biodiversity assessment and management by generating metabarcode specific databases, processing multilocus data, retaining all read types, and expanding non-traditional eDNA targets. Anacapa software and source code are open and available in a virtual container to ease installation.

Usage notes

Funding

National Science Foundation, Award: DEB 1644641, NSF-DGE 1650604, NSF-GRFP 2015204395

Location

California