Data from: Genotyping SNPs and inferring ploidy by amplicon sequencing for polyploid, ploidy-variable organisms
Delomas, Thomas et al. (2021), Data from: Genotyping SNPs and inferring ploidy by amplicon sequencing for polyploid, ploidy-variable organisms, Dryad, Dataset, https://doi.org/10.5061/dryad.crjdfn33r
Whole genome duplication is hypothesized to have played a critical role in the evolution of several major organism groups, including vertebrates, and while many lineages have rediploidized, some retain polyploid genomes. Additionally, variation in ploidy can occur naturally or be artificially induced within select plant and animal species. Modern genetic techniques have not been widely applied to polyploid or ploidy-variable species, in part due to the difficulty of obtaining genotype data from polyploids. In this study we provide a strategy for developing an amplicon sequencing panel of single nucleotide polymorphisms for high-throughput genotyping of polyploid organisms. We then developed a method to infer ploidy of individuals from amplicon sequencing data that is generalized to apply to any ploidy and does not require prior identification of heterozygous genotypes. Combining these two techniques will allow researchers to both infer ploidy and generate ploidy-aware genotypes with the same amplicon sequencing panel. We demonstrated this approach with white sturgeon Acipenser transmontanus, a ploidy-variable (octoploid, decaploid, and dodecaploid) imperiled species under conservation management in the Pacific Northwest and obtained a panel of 325 loci. These loci were validated by examining inheritance in known-cross families, and the ploidy inference method was validated with known ploidy samples. We provide scripts that adapt existing pipelines to genotype polyploids and an R package for application of the ploidy inference method. We expect that these techniques will empower studies of genetic variation and inheritance in polyploid organisms that vary in ploidy level, either naturally or as a result of artificial propagation practices.
These are read counts and genotypes for biallelic SNPs genotyped by amplicon sequencing (GT-seq).
The rda files are to be loaded into R statistical software. All files have two matrices, one of read counts for reference alleles and one for read counts of alternate alleles. Rows are samples and columns are loci.
10n_sturgeon_readCounts.rda: presumed 10N white sturgeon
8n_12n_sturgeon_readCounts.rda: confirmed 8N and 12N white sturgeon at various subsampling levels. Rownames indicate the sample number, subsampling level as a percentage, and the true ploidy.
2n_3n_Chinook_readCounts.rda: confirmed 2N and 3N Chinook salmon. Rownames indicate the true ploidy.
white_sturgeon_genos: ".genos" files produced by the GT-seq pipeline with genotypes and read counts for all samples in this study