Determining haploblocks and haplotypes in the MAGIC winter wheat population WM-800 based on the wheat 15k Infinium and the 135k Affymetrix SNP arrays
Pillen, Klaus et al. (2022), Determining haploblocks and haplotypes in the MAGIC winter wheat population WM-800 based on the wheat 15k Infinium and the 135k Affymetrix SNP arrays, Dryad, Dataset, https://doi.org/10.5061/dryad.zcrjdfnfk
Haplotypes are derived from single nucleotide polymorphisms (SNPs). They are beneficial (i) to remove redundant sequence information in genetic populations and, more important, (ii) to distinguish more than two variants/alleles at a genomic locus. A haploblock locus, made of multiple haplotypes, is very useful in multiparent-advanced-generation-intercross (MAGIC) populations, where, ideally, multiple founder alleles need to be distinguished at each locus to subsequently carry out efficient genome-wide association analysis studies (GWAS).
In this regard, the dataset contains genotype matrices (made of SNP, haploblock and haplotype data) for 800 lines of the MAGIC WHEAT population WM-800 (Sannemann et al. 2018). The datasets are based on genotyping the lines with both the already published wheat 15k Infinium SNP array (Sannemann et al. 2018) and the new wheat 135k Affymetrix SNP array.
The genotype matrices were inferred as follows:
Tables A1 & A2: Genotyping and physical anchoring of SNPs
For the 135k Affymetrix array, bulked DNA from twelve seedlings per WM-800 line in F4:5 generation and per founder cultivar was extracted and subjected to SNP genotyping by TraitGenetics, Gatersleben (http://www.traitgenetics.com/), a subsidiary of SGS Institut Fresenius GmbH, Taunusstein, Germany. Subsequently, SNPs polymorphic in WM-800 and with SNP calls for all eight founders underwent a quality check. Only SNPs with (i) < 5% missing calls in population WM-800 (ii) a minor allele frequency MAF > 5% and (iii) a known physical position in the wheat genome were kept. Chromosomal positions and physical base pairs positions, anchored to the Refseq v1.0 reference genome sequence of Chinese spring (Alaux et al. 2018), were provided by TraitGenetics for both the Affymetrix array and the Infinium array (Sannemann et al. 2018). The merging of both arrays resulted in a total of 27,006 polymorphic and physically anchored SNPs, including 19,522 SNPs of the 135k Affymetrix array and 7,484 SNPs of the 15k Infinium array (Tables A1 and A2). In total, 7,245 SNPs of the Infinium array were assigned additionally to genetic cM positions on wheat chromosomes based on the wheat consensus map (Sannemann et al. 2018). The Affymetrix SNPs, without genetic cM positions, were assigned to the wheat consensus map by placing an unmapped SNP between the two closest mapped markers based on the physical SNP position given by (Alaux et al. 2018, Table A1).
Table B1: SNP recoding and imputing
To carry out subsequent regression analyses, the original SNP genotype code (A,C,G,T) was transcribed into a numerical code (0,1,2) based on the presence of the Julius founder allele. At each SNP, WM-800 lines were assigned the SNP values 2 and 0 if the respective SNP genotype contained two Julius alleles (i.e. homozygous Julius) or two Non-Julius alleles (i.e. homozygous Non-Julius), respectively. Heterozygous genotypes, containing one Julius and one Non-Julius allele, were assigned the SNP value 1. Missing SNP calls were imputed by applying the mean imputation (MNI) approach (Rutkoski et al. 2013, Table B1).
Tables C1 & C2: Genetic similarity in WM-800
The simple matching procedure of SAS PROC DISTANCE (SAS 2020) was applied to Table A1 to calculate genetic similarity (GS) estimates between WM-800 lines and the eight WM founder cultivars (Table C1). Based on GS estimates, the population structure was characterized by applying a principal component analysis (PCA) with SAS PROC PRINCOMP (SAS 2020, Table C2).
Tables D1 to D6: Building haploblocks (HB) and haplotypes (HT)
Haploblocks (HB), which are made of SNPs in high linkage disequilibrium, were built using the software package Haploview 4.2 (Barrett et al. 2005). For this, the available set of SNP genotypes of the eight WM-800 founders (Table A1) was selected to build HBs using Haploview’s ‘Four Gamete Rule’ based on SNPs with a minor allele frequency MAF > 0.05. For each SNP pair with a distance of < 500 kb, a haploblock was formed and extended by consecutive SNPs if (i) at least one out of the four possible gametes was observed with a frequency of < 0.01 and (ii) a strong LD was estimated between the SNP pair with D’ = 1.0. SNPs not included in HBs were kept as so-called singular SNPs.
The physical position (in bp) and genetic position (in cM) of 27,006 SNPs are given in Table D1. In total, Haploview identified 2,970 HBs, representing between 2 and 198 SNPs (mean = 8.3; Table D2). Across all HBs a total of 92,734 HTs were identified in WM-800. Out of those, 8,498 informative HTs were selected passing the quality criteria of (i) HT frequency of > 5% in WM-800 and (ii) HT sequence without missing nucleotides (Table D3). A genotype matrix containing the genotype scores for 800 WM lines and 8 founders at the selected 8,498 HTs is given in Table D4.
Finally, the numerical genotype scores of the selected 8,498 HTs and 4,562 singular SNPs (i.e. SNPs not included in HBs) were coded as 0 (absent) or 1 (present) for each WM-800 line in Tables D5 (only HTs) and D6 (merged HTs and singular SNPs).
Alaux M, Rogers J, Letellier T, Flores R, Alfama F, Pommier C, Mohellibi N, Durand S, Kimmel E, Michotey C, Guerche C, Loaec M, Lain‚ M, Steinbach D, Choulet F, Rimbert H, Leroy P, Guilhot N, Salse J, Feuillet C, Paux E, Eversole K, Adam-Blondon A-F, Quesneville H (2018) Linking the International Wheat Genome Sequencing Consortium bread wheat reference genome sequence to wheat genetic and phenomic data. Genome Biology 19:111. https://doi.org/10.1186/s13059-018-1491-4.
- Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263-265. https://doi.org/10.1093/bioinformatics/bth457.
- Rutkoski JE, Poland J, Jannink J-L, Sorrells ME (2013) Imputation of unordered markers and the impact on genomic selection accuracy. G3-GENES GENOMES GENETICS 3:427-439. https://doi.org/10.1534/g3.112.005363.
- Sannemann W, Lisker A, Maurer A, Leon J, Kazman E, Coster H, Holzapfel J, Kempf H, Korzun V, Ebmeyer E, Pillen K (2018) Adaptive selection of founder segments and epistatic control of plant height in the MAGIC winter wheat population WM-800. BMC Genomics 19:16. https://doi.org/10.1186/s12864-018-4915-3.
- SAS (2020) SAS Enterprise Guide 8.3, SAS Institute Inc. Cary, North Carolina, USA, https://www.sas.com/de_de/software/enterprise-guide.html.
German Federal Ministry of Food and Agriculture, Award: 2814601013