Data from: Massively parallel multiplex DNA sequencing for specimen identification using an Illumina MiSeq platform
Data files
Feb 18, 2016 version files 6.39 GB
-
Plate01-malaise-BR_S13_L001_R1_001.fastq.gz
-
Plate01-malaise-BR_S13_L001_R2_001.fastq.gz
-
Plate01-malaise-FC_S1_L001_R1_001.fastq.gz
-
Plate01-malaise-FC_S1_L001_R2_001.fastq.gz
-
Plate02-malaise-BR_S14_L001_R1_001.fastq.gz
-
Plate02-malaise-BR_S14_L001_R2_001.fastq.gz
-
Plate02-malaise-FC_S2_L001_R1_001.fastq.gz
-
Plate02-malaise-FC_S2_L001_R2_001.fastq.gz
-
Plate03-malaise-BR_S15_L001_R1_001.fastq.gz
-
Plate03-malaise-BR_S15_L001_R2_001.fastq.gz
-
Plate03-malaise-FC_S3_L001_R1_001.fastq.gz
-
Plate03-malaise-FC_S3_L001_R2_001.fastq.gz
-
Plate04-malaise-BR_S16_L001_R1_001.fastq.gz
-
Plate04-malaise-BR_S16_L001_R2_001.fastq.gz
-
Plate04-malaise-FC_S4_L001_R1_001.fastq.gz
-
Plate04-malaise-FC_S4_L001_R2_001.fastq.gz
-
Plate05-malaise-BR_S17_L001_R1_001.fastq.gz
-
Plate05-malaise-BR_S17_L001_R2_001.fastq.gz
-
Plate05-malaise-FC_S5_L001_R1_001.fastq.gz
-
Plate05-malaise-FC_S5_L001_R2_001.fastq.gz
-
Plate06-malaise-BR_S18_L001_R1_001.fastq.gz
-
Plate06-malaise-BR_S18_L001_R2_001.fastq.gz
-
Plate06-malaise-FC_S6_L001_R1_001.fastq.gz
-
Plate06-malaise-FC_S6_L001_R2_001.fastq.gz
-
Plate07-malaise-BR_S19_L001_R1_001.fastq.gz
-
Plate07-malaise-BR_S19_L001_R2_001.fastq.gz
-
Plate07-malaise-FC_S7_L001_R1_001.fastq.gz
-
Plate07-malaise-FC_S7_L001_R2_001.fastq.gz
-
Plate08-malaise-BR_S20_L001_R1_001.fastq.gz
-
Plate08-malaise-BR_S20_L001_R2_001.fastq.gz
-
Plate08-malaise-FC_S8_L001_R1_001.fastq.gz
-
Plate08-malaise-FC_S8_L001_R2_001.fastq.gz
-
Plate09-malaise-BR_S21_L001_R1_001.fastq.gz
-
Plate09-malaise-BR_S21_L001_R2_001.fastq.gz
-
Plate09-malaise-FC_S9_L001_R1_001.fastq.gz
-
Plate09-malaise-FC_S9_L001_R2_001.fastq.gz
-
Plate10-malaise-BR_S22_L001_R1_001.fastq.gz
-
Plate10-malaise-BR_S22_L001_R2_001.fastq.gz
-
Plate10-malaise-FC_S10_L001_R1_001.fastq.gz
-
Plate10-malaise-FC_S10_L001_R2_001.fastq.gz
Abstract
Genetic information is a valuable component of biosystematics, especially specimen identification through the use of species-specific DNA barcodes. Although many genomics applications have shifted to High-Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies, sample identification (e.g., via DNA barcoding) is still most often done with Sanger sequencing. Here, we present a scalable double dual-indexing approach using an Illumina Miseq platform to sequence DNA barcode markers. We achieved 97.3% success by using half of an Illumina Miseq flowcell to obtain 658 base pairs of the cytochrome c oxidase I DNA barcode in 1,010 specimens from eleven orders of arthropods. Our approach recovers a greater proportion of DNA barcode sequences from individuals than does conventional Sanger sequencing, while at the same time reducing both per specimen costs and labor time by nearly 80%. In addition, the use of HTS allows the recovery of multiple sequences per specimen, for deeper analysis of genetic variation in target gene regions.