Passport data and genotyping data related to the WEW-GB collection
Data files
This dataset is embargoed and will be released when the associated article is published. Contact gro.dayrdatad@pleh to notify us of article publication.
Lists of files and downloads will become available to the public when released.
Abstract
A collection of a wild emmer wheat accessions (WEW-GB) was characterised and made available from Council for Agricultural Research Centre- Research Centre for Genomics and Bioinformatics (Italy) (Mastrangelo et al., The Plant Genome, submitted). Here, passport data for the 263 wild emmer wheat accessions of the WEW-GB collection are provided, togetgher with the Hapmap for the 263 WEW-GB collection based on the Axiom™ BreedWheat 35K Genotyping Array (11342 polymorphic SNPs).
README: Passport data and genotyping data related to the WEW-GB collection
https://doi.org/10.5061/dryad.pzgmsbctb
A collection of a wild emmer wheat accessions (WEW-GB) was established and characterised from Council for Agricultural Research Centre (CREA)- Research Centre for Genomics and Bioinformatics (Italy) through the pubblication Mastrangelo et al., The plant genome, e20413. doi: 10.1002/tpg2.20413. The dataset contains passport data for the 263 wild emmer wheat accessions of the WEW-GB collection and corresponding Hapmap of SNP data.
Description of the data and file structure
- Passport data for the 263 wild emmer wheat accessions of the WEW-GB collection (xlsx)
The following information are included per each accession of the WEW-GB collection, to define them and their geographic origin:
ID at CREA: Accession ID label assigned by CREA and referred to within the pubblication
Donor: Acronym name of the original donor genbank/research institution who provided accessions to CREA (USDA: U.S. Department of Agriculture; IPK: Leibniz Institute of Plant Genetics and Crop Plant Research ; KOMUGI: National BioResource Project- Wheat genetic resources database).
Gene-bank code: Accession ID label assigned by the original donor genbank
Country of origin: country name where the accession was originally sampled in nature
Collection site: geographic name of the location where the accession was originally sampled in nature
Lat(N)and Long(E): GPS coordinate of the original sampling location
Elevation: mt above see level of the location where the accession was originally sampled in nature
- Hapmap for the 263 WEW-GB collection based on the Affimetrix Axiom BreedWheat 35K Genotyping Array (csv) The hapmap is a matrix-based dataset reporting on rows the list of the 35K Axiom SNP markers polymorphic in the WEW-GB collection (11342 polymorphic SNPs), and on columns some information for each SNP, and then corresponding SNP calls for each accession of the WEW-GB collection in diploid 4 bases code. For each SNPs, the dataset reports: SNP ID, polymorphic alleles, chromosome assignment, position of the SNP on the asigned chromosome in basepairs. NA: information not available; NN: missing data.
Code/Software
The 283 accessions were genotyped with the Affymetrix 35K Axiom array at the Genomics Facility of University of Bristol. Allele calling was carried out using the Affymetrix proprietary software package Axiom Analysis Suite, according to the Axiom Best Practices Genotyping workflow. It includes a Bayesian based clustering that was performed selecting generic priors, as this setting produced less mis-clustering than when bread wheat specific priors had been applied. Only SNPs classified as polymorphic at high resolution and as off-target variants by the Affymetrix software were selected. They were then filtered for missing data (max 10%) and for heterozygosity (max 10%).
The probe sequences of the polymorphic SNPs were used as queries in BLASTn similarity searches against the pseudomolecules of the wild emmer v2 reference genome (accession Zavitan, Zhu et al., 2019) to obtain marker physical position. Parameters for BLASTn were as follows: word size 11, gap open 5, gap extend 2, penalty -2, reward 1. Both allele variants were individually considered for BLAST. Analogous parameters were used to BLASTn sequences of SNP markers against the T. aestivum reference genome (IWGSC RefSeqv2.1, Zhu et al. 2021) and the T. durum reference genome (Svevo v1, Maccaferri et al., 2019).
Methods
Plant material
A panel of 283 accessions of wild emmer wheat collected across countries of the Fertile Crescent was organized and is currently maintained at the CREA Research Centre for Genomics and Bioinformatics (WEW-GB: Wild Emmer Wheat at CREA Research Centre for Genomics and Bioinformatics). Accessions were obtained either from genebanks (the United States Department of Agriculture - Agricultural Research Service National Small Grains Collection (USDA-ARS NSGC), the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), the National BioResource Project (NBRP) through KOMUGI) or from partners. The plant materials represent the entire Fertile Crescent region where wild emmer naturally occurs, from the Northern East area, encompassing Turkey, Iraq and Iran to the Southern region (Southern Levant), which includes Israel, Lebanon and Syria. All accessions have been undergone to one cycle of single seed descendance (SSD).
Genotyping, filtering and mapping of the polymorphic SNPs on the reference wild emmer genome
Young leaves of a single plant from each SSD were collected and genomic DNA was extracted using the CTAB method (Doyle and Doyle, 1987). The 283 accessions were genotyped with the Affymetrix 35K Axiom array at the Genomics Facility of University of Bristol. Allele calling was carried out using the Affymetrix proprietary software package Axiom Analysis Suite®, according to the Axiom Best Practices Genotyping workflow. It includes a Bayesian based clustering that was performed selecting generic priors, as this setting produced less mis-clustering than when bread wheat specific priors had been applied. Only SNPs classified as polymorphic at high resolution and as off-target variants by the Affymetrix software were selected. They were then filtered for missing data (max 10%) and for heterozygosity (max 10%).
The probe sequences of the polymorphic SNPs were used as queries in BLASTn similarity searches against the pseudomolecules of the wild emmer v2 reference genome (accession Zavitan, Zhu et al., 2019) to obtain marker physical position. Parameters for BLASTn were as follows: word size 11, gap open 5, gap extend 2, penalty -2, reward 1. Both allele variants were individually considered for BLAST. Analogous parameters were used to BLASTn sequences of SNP markers against the T. aestivum reference genome (IWGSC RefSeqv2.1, Zhu et al. 2021) and the T. durum reference genome (Svevo v1, Maccaferri et al., 2019).