Skip to main content

Genotyping-by-sequencing of Canada’s Apple Biodiversity Collection

Cite this dataset

Migicovsky, Zoë; Douglas, Gavin M.; Myles, Sean (2022). Genotyping-by-sequencing of Canada’s Apple Biodiversity Collection [Dataset]. Dryad.


Canada’s Apple Biodiversity Collection (ABC) is one of the most diverse collections of apples in the world, which was designed to enable genetic mapping. The ABC is located at the Agriculture and Agri-Food Canada (AAFC) Kentville Research and Development Centre in Nova Scotia, Canada. In addition to phenotypic descriptions of the ABC, sequencing the accessions in the collection provides a valuable resource not only for researchers working on the collection, but for those studying apples more broadly. With this in mind, we report and make publicly available genotyping-by-sequencing (GBS) data for over 1,000 apple accessions from the ABC. By using three SNP callers and imputation, we were able to genotype 278,231 SNPs from 1,175 diverse apple accessions from the ABC.


Full details are available in Migicovsky et al. (Submitted).

Briefly, young leaf tissue was collected from all accessions in the ABC and DNA was extracted using commercial kits. DNA was sequenced using GBS (Elshire et al., 2011) with ApeKI and PstI-EcoT22I restriction enzymes. GBS libraries were sequenced using Illumina Hi-Seq 2000 technology. Single nucleotide polymorphisms (SNPs) were called using three different SNP calling pipelines: GATK (v3.7) (McKenna et al., 2010), SAMtools (v1.3) (Li et al., 2009), and TASSEL (V5.2.32) (Bradbury et al., 2007), using reference genome GDDH13 Version 1.1 (Daccord et al., 2017).

SNPs were imputed for each caller separately using LinkImputeR (Money et al., 2017) at a maximum position/sample missingness of 70% and a minimum depth of four reads, resulting in imputation accuracies/correlation values of 0.9558/0.8761 (GATK), 0.9526/0.8696 (SAMtools), and 0.9556/0.8347 (TASSEL). Following imputation, SNP counts for each caller were 165,418 (GATK), 195,667 (SAMtools), and 226,821 (TASSEL). SNPs were pooled by merging the three VCF files and when SNPs overlapped across callers, one SNP was randomly chosen resulting in a final SNP set with 22.64% of SNPs from GATK, 30.23% from SAMtools, and 47.14% from TASSEL. The resulting SNP set consisted of 278,224 SNPs across 1,175 unique accessions.


Natural Sciences and Engineering Research Council