Skip to main content
Dryad

Data from: Parallel tagged amplicon sequencing of relatively long PCR products using the Illumina HiSeq platform and transcriptome assembly

Cite this dataset

Feng, Yan-Jie et al. (2015). Data from: Parallel tagged amplicon sequencing of relatively long PCR products using the Illumina HiSeq platform and transcriptome assembly [Dataset]. Dryad. https://doi.org/10.5061/dryad.n21cr

Abstract

In phylogenetics and population genetics, a large number of loci are often needed to accurately resolve species relationships. Normally, loci are enriched by PCR and sequenced by Sanger sequencing, which is expensive when the number of amplicons is large. Next-generation sequencing (NGS) techniques are increasingly used for parallel amplicon sequencing, which reduces sequencing costs tremendously, but has not reduced preparation costs very much. Moreover, for most current NGS methods, amplicons need to be purified and quantified before sequencing and their lengths are also restricted (normally < 700 bp). Here, we describe an approach to sequence pooled amplicons of any length using the Illumina platform. Using this method, amplicons are pooled at equal volume rather than at equal concentration, thus eliminating the laborious purification and quantification steps. We then shear the pooled amplicons, repair the ends, add sample identifying linkers, and pool multiple samples prior to Illumina library preparation. Data are then assembled using the transcriptome assembly program Trinity, which is optimized to deal with templates of highly varying quantities. We demonstrated the utility of our approach by recovering 93.5% of the target amplicons (size up to 1650 bp) in full length for a 16 taxa × 101 loci project, using ~2.0 GB of Illumina HiSeq paired-end 90 bp data. Overall, we validate a rapid, cost-effective and scalable approach to sequence a large number of targeted loci from a large number of samples that is particularly suitable for both phylogenetics and population genetics studies that require a modest scale of data.

Usage notes