Data from: Sorting specimen-rich invertebrate samples with cost-effective NGS barcodes: validating a reverse workflow for specimen processing
Cite this dataset
Wang, Wendy Y. et al. (2018). Data from: Sorting specimen-rich invertebrate samples with cost-effective NGS barcodes: validating a reverse workflow for specimen processing [Dataset]. Dryad. https://doi.org/10.5061/dryad.8h950
Biologists frequently sort specimen-rich samples to species. This process is daunting when based on morphology, and disadvantageous if performed using molecular methods that destroy vouchers (e.g., metabarcoding). An alternative is barcoding every specimen in a bulk sample and then presorting the specimens using DNA barcodes, thus mitigating downstream morphological work on presorted units. Such a “reverse workflow” is too expensive using Sanger sequencing, but we here demonstrate that is feasible with an NGS barcoding pipeline that allows for cost-effective high throughput generation of short specimen-specific barcodes (313 bp of COI; lab cost <$0.50 per specimen) through Next Generation Sequencing of tagged amplicons. We applied our approach to a large sample of tropical ants, obtaining barcodes for 3290 of 4032 specimens (82%). NGS barcodes and their corresponding specimens were then sorted into molecular operational taxonomic units (mOTUs) based on objective clustering and Automated Barcode Gap Discovery (ABGD). High diversity of 88-90 mOTUs (4% clustering) was found and morphologically validated based on preserved vouchers. The mOTUs were overwhelmingly in agreement with morphospecies (match ratio 0.95 at 4% clustering). Because of lack of coverage in existing barcode databases, only 18 could be accurately identified to named species, but our study yielded new barcodes for 48 species, including 28 that are potentially new to science. With its low cost and technical simplicity, the NGS barcoding pipeline can be implemented by a large range of laboratories. It accelerates invertebrate species discovery, facilitates downstream taxonomic work, helps with building comprehensive barcode databases, and yields precise abundance information.