Wild emmer wheat (Triticum turgidum subsp. dicoccoides) whole-genome sequencing data and analysis output files
Data files
Mar 31, 2026 version files 767.17 MB
-
data-curated.genoFile.dicoccoides.3.4.SNPs.for.pub.txt.zip
244.97 MB
-
judaicum_Segregating.SNPs.txt.zip
14.34 MB
-
judaicum.LD.estimation.txt
112.49 MB
-
Line.info.dicoccoides.txt
3.32 KB
-
lineinfo.dark.branches.rust.no.tip.xlsx
26.32 KB
-
Northen_Population.LD.estimation.txt
84.09 MB
-
Northern_Population_Segregating.SNPs.txt.zip
54.74 MB
-
README.md
4.48 KB
-
Readme.pdf
48.98 KB
-
Southern_Levant_Segregating.SNPs.txt.zip
136.62 MB
-
Southern.Levant_Population.LD.estimation.txt
119.85 MB
Abstract
Triticum turgidum L. subsp. dicoccoides, known as wild emmer wheat (WEW), is a tetraploid (2n = 4X = 28) relative and a progenitor of cultivated wheat. We conducted population genetics and evolutionary studies, along with dissecting the genetic basis of resistance to three rust diseases in WEW. Leveraging whole-genome sequencing (WGS) data (14 TB) from 291 accessions at approximately 9.5x coverage, we identified 3.4 million high-quality SNP markers and utilized them for phylogenetic clustering, principal component analysis, and population structure assessment. Additionally, we performed diversity and pairwise FST analysis among the subgroups identified by the population structure. For genome-wide association studies (GWAS), we investigated seedling-stage resistance in WEW against five races each of stem, leaf, and stripe/yellow rusts.
Authors: Laxman Adhikari, Pablo D. Olivera, John Raupp, Hanan Sela, Assaf Distelfeld, Marco Maccaferri, Matteo Bozzoli, Elisabetta Mazzucotelli, Andrea Brandolini, Roberto Tuberosa, Hakan Özkan, Brande B. H. Wulff, Brian J. Steffenson and Jesse Poland
Description: This README file explains the dataset.
Data Availability: The raw sequence reads (pair-end fastq files) generated using whole-genome sequencing (WGS) of the 291 wild emmer wheat (WEW) accessions (Triticum turgidum L. subsp. dicoccoides) were deposited at the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) with BioProject accession number PRJNA1007489. The fastq files were trimmed, aligned, and analyzed to identify single nucleotide polymorphisms (SNPs). The population genomics analysis was conducted using a 5% subset (3.4 million SNPs) of filtered quality SNPs (68 million).
Dataset: We provided R scripts (.Rmd files) for the population genomics analysis of wild emmer wheat, involving phylogenetic clustering, PCA, population structure, gene bank curation, and genome-wide association study (GWAS) used in the study. All required files are provided either as attachments along with the manuscript or as Dryad supplementary files. The sequence fastq files have been deposited at NCBI SRA. There are 291 pair-end fastq files that can be searched with the BioProject accession (PRJNA1007489). The fastq files were provided with the original names required to run the SNP calling pipeline. In this study, we called SNPs using bcftools. We also provided phenotype data along with the supplementary documents of the manuscript that can be used to run GWAS.
The fastq files can be searched at NCBI SRA:
https://www.ncbi.nlm.nih.gov/bioproject/1007489
| Files | Description |
|---|---|
| data-curated.genoFile.dicoccoides.3.4.SNPs.for.pub.txt.zip | SNP matrix of the entire population used in the analysis |
| judaicum_Segregating.SNPs.txt.zip | SNP matrix including the segregating loci of the judaicum race of the WEW |
| Northern_Population_Segregating.SNPs.txt.zip | SNP matrix including the segregating loci of the Northern Population of the WEW |
| Southern_Levant_Segregating.SNPs.txt.zip | SNP matrix including the segregating loci of the Southern Levant (SL) population of the WEW |
| Line.info.dicoccoides.txt | Information about the geographical origin of the WEW accessions |
| lineinfo.dark.branches.rust.no.tip.xlsx | Disease severity score used to plot the phylogenetic tree vs. geography vs. disease severity |
| judaicum.LD.estimation.txt | Pairwise LD (r2) estimated for the judaicum race |
| Northen_Population.LD.estimation.txt | Pairwise LD (r2) estimated for the Northern Population |
| Southern.Levant_Population.LD.estimation.txt | Pairwise LD (r2) estimated for the Southern Levant (SL) Population |
| dicoccoides.WGS.panel.phylogenetic.tree.PCA.Pop.Str.Rmd | R script for genetic clustering, PCA analysis, and population structure |
| phylogeny.vs.geography.vs.diseasease.severity.TTKSK.as.example.Rmd | R script for genetic clustering vs. disease severity and geography |
| LD.Decay.plot_Rscript.WEW.Rmd | R script to generate LD decay plot |
| colorful.manhattan.and.SNP.density.Rmd | R script for Manhattan plot and SNP |
Genotyping:
i) Whole-genome sequencing (WGS) of 291 individual accessions of wild emmer wheat (Triticum turgidum L. subsp. dicoccoides).
ii) Single nucleotide polymorphism (SNP) calling using bcftools, filtered to obtain a subset of filtered SNPs resulting in 3.4 million high-quality SNPs.
iii) Performed PCA, population structure analysis, phylogenetic tree construction, and genome-wide association study (GWAS) as described in the manuscript.
Phenotyping:
Screened seedlings of the WGS panel for 15 different races of pathogens: 5 races each of stem, leaf, and stripe rust pathogens.
