Skip to main content

Data from: Cost-efficient high throughput capture of museum arthropod specimen DNA using PCR-generated baits

Cite this dataset

Knyshov, Alexander; Gordon, Eric R. L.; Weirauch, Christiane (2019). Data from: Cost-efficient high throughput capture of museum arthropod specimen DNA using PCR-generated baits [Dataset]. Dryad.


Gathering genetic data for rare species is one of the biggest remaining obstacles in modern phylogenetics, particularly for megadiverse groups such as arthropods. Next generation sequencing techniques allow for sequencing of short DNA fragments contained in preserved specimens >20 years old, but approaches such as whole genome sequencing are often too expensive for projects including many taxa. Several methods of reduced representation sequencing have been proposed that lower the cost of sequencing per specimen, but many remain costly because they involve synthesizing nucleotide probes and target hundreds of loci. These datasets are also frequently unique for each project and thus generally incompatible with other similar datasets. Here, we explore utilization of in‐house generated DNA baits to capture commonly utilized mitochondrial and ribosomal DNA loci from insect museum specimens of various age and preservation types without the a priori need to know the sequence of the target loci. Both within species and cross‐species capture are explored, on preserved specimens ranging in age from one to 54 years old. We found most samples produced sufficient amounts of data to assemble the nuclear ribosomal rRNA genes and near complete mitochondrial genomes and produce well‐resolved phylogenies in line with expected results. The dataset obtained can be straightforwardly combined with the large cache of existing Sanger‐sequencing‐generated data built up over the past 30 years and targeted loci can be easily modified to those commonly used in different taxa. Furthermore, the protocol we describe allows for inexpensive data generation (as low as ~$35/sample), of at least 20 kilobases per specimen, for specimens at least as old as ~1965, and can be easily conducted in most laboratories. If widely applied, this technique will accelerate the accurate resolution of the Tree of Life especially on non‐model organisms with limited existing genomic resources.

Usage notes