Skip to main content

Data from: Detection of individual ploidy levels with genotyping-by-sequencing (GBS) analysis

Cite this dataset

Gompert, Zachariah; Mock, Karen E. (2017). Data from: Detection of individual ploidy levels with genotyping-by-sequencing (GBS) analysis [Dataset]. Dryad.


Ploidy levels sometimes vary among individuals or populations, particularly in plants. When such variation exists, accurate determination of cytotype can inform studies of ecology or trait variation and is required for population genetic analyses. Here we propose and evaluate a statistical approach for distinguishing low-level ploidy variants (e.g., diploids, triploids and tetraploids) based on genotyping-by-sequencing data. The method infers cytotypes based on observed heterozygosity and the ratio of DNA sequences containing different alleles at thousands of heterozygous SNPs (i.e., allelic ratios). Whereas the method does not require prior information on ploidy, a reference set of samples with known ploidy can be included in the analysis if it is available. We explore the power and limitations of this method using simulated data sets and GBS data from natural populations of aspen (Populus tremuloides) known to include both diploid and triploid individuals. The proposed method was able to reliably discriminate among diploids, triploids and tetraploids in simulated data sets, and this was true for different levels of genetic diversity, inbreeding and population structure. Power and accuracy were minimally affected by low coverage (i.e., 2X), but did sometimes suffer when simulated mixtures of diploids, autotetraploids and allotetraploids were analyzed. Cytotype assignments based on the proposed method closely matched those from previous microsatellite and flow cytometry data when applied to GBS data from aspen. An R package (gbs2ploidy) implementing the proposed method is available from CRAN.

Usage notes


National Science Foundation, Award: 1638768