Data from: Massively parallel multiplex DNA sequencing for specimen identification using an Illumina MiSeq platform
Data files
Feb 18, 2016 version files 6.39 GB
-
Plate01-malaise-BR_S13_L001_R1_001.fastq.gz
160.09 MB
-
Plate01-malaise-BR_S13_L001_R2_001.fastq.gz
127.18 MB
-
Plate01-malaise-FC_S1_L001_R1_001.fastq.gz
208.67 MB
-
Plate01-malaise-FC_S1_L001_R2_001.fastq.gz
136.77 MB
-
Plate02-malaise-BR_S14_L001_R1_001.fastq.gz
166.65 MB
-
Plate02-malaise-BR_S14_L001_R2_001.fastq.gz
131.46 MB
-
Plate02-malaise-FC_S2_L001_R1_001.fastq.gz
189.26 MB
-
Plate02-malaise-FC_S2_L001_R2_001.fastq.gz
138.74 MB
-
Plate03-malaise-BR_S15_L001_R1_001.fastq.gz
167.08 MB
-
Plate03-malaise-BR_S15_L001_R2_001.fastq.gz
130.42 MB
-
Plate03-malaise-FC_S3_L001_R1_001.fastq.gz
232.68 MB
-
Plate03-malaise-FC_S3_L001_R2_001.fastq.gz
187.54 MB
-
Plate04-malaise-BR_S16_L001_R1_001.fastq.gz
154.22 MB
-
Plate04-malaise-BR_S16_L001_R2_001.fastq.gz
120.94 MB
-
Plate04-malaise-FC_S4_L001_R1_001.fastq.gz
167.72 MB
-
Plate04-malaise-FC_S4_L001_R2_001.fastq.gz
104.59 MB
-
Plate05-malaise-BR_S17_L001_R1_001.fastq.gz
207.69 MB
-
Plate05-malaise-BR_S17_L001_R2_001.fastq.gz
177.71 MB
-
Plate05-malaise-FC_S5_L001_R1_001.fastq.gz
190.38 MB
-
Plate05-malaise-FC_S5_L001_R2_001.fastq.gz
134.93 MB
-
Plate06-malaise-BR_S18_L001_R1_001.fastq.gz
142.96 MB
-
Plate06-malaise-BR_S18_L001_R2_001.fastq.gz
107.26 MB
-
Plate06-malaise-FC_S6_L001_R1_001.fastq.gz
177.84 MB
-
Plate06-malaise-FC_S6_L001_R2_001.fastq.gz
120.99 MB
-
Plate07-malaise-BR_S19_L001_R1_001.fastq.gz
155.67 MB
-
Plate07-malaise-BR_S19_L001_R2_001.fastq.gz
125.78 MB
-
Plate07-malaise-FC_S7_L001_R1_001.fastq.gz
144.40 MB
-
Plate07-malaise-FC_S7_L001_R2_001.fastq.gz
95.33 MB
-
Plate08-malaise-BR_S20_L001_R1_001.fastq.gz
157.36 MB
-
Plate08-malaise-BR_S20_L001_R2_001.fastq.gz
117.94 MB
-
Plate08-malaise-FC_S8_L001_R1_001.fastq.gz
200.97 MB
-
Plate08-malaise-FC_S8_L001_R2_001.fastq.gz
161.68 MB
-
Plate09-malaise-BR_S21_L001_R1_001.fastq.gz
215.55 MB
-
Plate09-malaise-BR_S21_L001_R2_001.fastq.gz
182.65 MB
-
Plate09-malaise-FC_S9_L001_R1_001.fastq.gz
213.22 MB
-
Plate09-malaise-FC_S9_L001_R2_001.fastq.gz
188.92 MB
-
Plate10-malaise-BR_S22_L001_R1_001.fastq.gz
173.80 MB
-
Plate10-malaise-BR_S22_L001_R2_001.fastq.gz
132.09 MB
-
Plate10-malaise-FC_S10_L001_R1_001.fastq.gz
194.26 MB
-
Plate10-malaise-FC_S10_L001_R2_001.fastq.gz
148.20 MB
Abstract
Genetic information is a valuable component of biosystematics, especially specimen identification through the use of species-specific DNA barcodes. Although many genomics applications have shifted to High-Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies, sample identification (e.g., via DNA barcoding) is still most often done with Sanger sequencing. Here, we present a scalable double dual-indexing approach using an Illumina Miseq platform to sequence DNA barcode markers. We achieved 97.3% success by using half of an Illumina Miseq flowcell to obtain 658 base pairs of the cytochrome c oxidase I DNA barcode in 1,010 specimens from eleven orders of arthropods. Our approach recovers a greater proportion of DNA barcode sequences from individuals than does conventional Sanger sequencing, while at the same time reducing both per specimen costs and labor time by nearly 80%. In addition, the use of HTS allows the recovery of multiple sequences per specimen, for deeper analysis of genetic variation in target gene regions.