Data from: Optimization of the genotyping-by-sequencing strategy for population genomic analysis in conifers
Pan, Jin et al. (2014), Data from: Optimization of the genotyping-by-sequencing strategy for population genomic analysis in conifers, Dryad, Dataset, https://doi.org/10.5061/dryad.39gc6
Flexibility and low cost make genotyping-by-sequencing (GBS) an ideal tool for population genomic studies of nonmodel species. However, to utilize the potential of the method fully, many parameters affecting library quality and single nucleotide polymorphism (SNP) discovery require optimization, especially for conifer genomes with a high repetitive DNA content. In this study, we explored strategies for effective GBS analysis in pine species. We constructed GBS libraries using HpaII, PstI and EcoRI-MseI digestions with different multiplexing levels and examined the effect of restriction enzymes on library complexity and the impact of sequencing depth and size selection of restriction fragments on sequence coverage bias. We tested and compared UNEAK, Stacks and GATK pipelines for the GBS data, and then developed a reference-free SNP calling strategy for haploid pine genomes. Our GBS procedure proved to be effective in SNP discovery, producing 7000–11 000 and 14 751 SNPs within and among three pine species, respectively, from a PstI library. This investigation provides guidance for the design and analysis of GBS experiments, particularly for organisms for which genomic information is lacking.