Data from: RADcap: sequence capture of dual-digest RADseq libraries with identifiable duplicates and reduced missing data
Hoffberg, Sandra L. et al. (2016), Data from: RADcap: sequence capture of dual-digest RADseq libraries with identifiable duplicates and reduced missing data, Dryad, Dataset, https://doi.org/10.5061/dryad.ss6c9
Molecular ecologists seek to genotype hundreds to thousands of loci from hundreds to thousands of individuals at minimal cost per sample. Current methods, such as restriction site associated DNA sequencing (RADseq) and sequence capture, are constrained by costs associated with inefficient use of sequencing data and sample preparation. Here, we introduce RADcap, an approach that combines the major benefits of RADseq (low cost with specific start positions) with those of sequence capture (repeatable sequencing of specific loci) to significantly increase efficiency and reduce costs relative to current approaches. RADcap uses a new version of dual-digest RADseq (3RAD) to identify candidate SNP loci for capture bait design, and subsequently uses custom sequence capture baits to consistently enrich candidate SNP loci across many individuals. We combined this approach with a new library preparation method for identifying and removing PCR duplicates from 3RAD libraries, which allows researchers to process RADseq data using traditional pipelines, and we tested the RADcap method by genotyping sets of 96 to 384 Wisteria plants. Our results demonstrate that our RADcap method: (1) methodologically reduces (to <5%) and allows computational removal of PCR duplicate reads from data; (2) achieves 80-90% reads-on-target in 11 of 12 enrichments; (3) returns consistent coverage (≥4x) across >90% of individuals at up to 99.8% of the targeted loci; (4) produces consistently high occupancy matrices of genotypes across hundreds of individuals; and (5) costs significantly less than current approaches.