Data from: Development of highly reliable in silico SNP resource and genotyping assay from exome capture and sequencing: an example from black spruce (Picea mariana)
Bousquet, Jean et al. (2015), Data from: Development of highly reliable in silico SNP resource and genotyping assay from exome capture and sequencing: an example from black spruce (Picea mariana), Dryad, Dataset, https://doi.org/10.5061/dryad.tr87v
Picea mariana is a widely distributed boreal conifer across Canada and the subject of advanced breeding programs for which population genomics and genomic selection approaches are being developed. Targeted sequencing was achieved after capturing P. mariana exome with probes designed from the sequenced transcriptome of Picea glauca, a distant relative. A high capture efficiency of 75.9% was reached although spruce has a complex and large genome including gene sequences interspersed by some long introns. The results confirmed the relevance of using probes from congeneric species to perform successfully interspecific exome capture in the genus Picea. A bioinformatics pipeline was developed including stringent criteria that helped detect a set of 97 075 highly reliable in silico SNPs. These SNPs were distributed across 14 909 genes. Part of an Infinium iSelect array was used to estimate the rate of true positives by validating 4267 of the predicted in silico SNPs by genotyping trees from P. mariana populations. The true positive rate was 96.2%, for in silico SNPs compared to a genotyping success rate of 96.7% for a set 1115 P. mariana control SNPs recycled from previous genotyping arrays. These results indicate the high success rate of the genotyping array and the relevance of the selection criteria used to delineate the new P. mariana in silico SNP resource. Furthermore, in silico SNPs were generally of medium to high frequency in natural populations, thus providing high informative value for future population genomics applications.