Skip to main content

Data from: A MinION™‐based pipeline for fast and cost‐effective DNA barcoding

Cite this dataset

Srivathsan, Amrita et al. (2018). Data from: A MinION™‐based pipeline for fast and cost‐effective DNA barcoding [Dataset]. Dryad.


DNA barcodes are useful for species discovery and species identification, but obtaining barcodes currently requires a well-equipped molecular laboratory, is time-consuming, and/or expensive. We here address these issues by developing a barcoding pipeline for Oxford Nanopore MinION™ and demonstrate that one flowcell can generate barcodes for ~500 specimens despite high base-call error rates of MinION™ reads. The pipeline overcomes the errors by first summarizing all reads for the same tagged amplicon as a consensus barcode. Consensus barcodes are overall mismatch-free but retain indel errors that are concentrated in homopolymeric regions. They are addressed with an optional error correction pipeline that uses conserved amino-acid motifs from publicly available barcodes to correct the indel errors. The effectiveness of this pipeline is documented by analysing reads from three MinION™ runs that represent different stages of MinION™ development. They generated data for (1) 511 specimens of a mixed Diptera sample, (2) 575 specimens of ants, and (3) 50 specimens of Chironomidae. The run based on the latest chemistry yielded MinION barcodes for 490 of the 511 specimens which were assessed against reference Sanger barcodes (N=471). Overall, the MinION barcodes have an accuracy of 99.3%-100% and the number of post-correction ambiguities ranges from <0.01-1.5% depending on which correction pipeline is used. We demonstrate that it requires ~2 hours of sequencing to gather all information needed for obtaining reliable barcodes for most specimens (>90%). We estimate that up to 1000 barcodes can be generated in one flowcell and that the cost per barcode can be

Usage notes