Skip to main content
Dryad

Population genomics of the wild wheat Aegilops tauschii (Open wild wheat consortium phase II)

Cite this dataset

Cavalet-Giorsa, Emile et al. (2024). Population genomics of the wild wheat Aegilops tauschii (Open wild wheat consortium phase II) [Dataset]. Dryad. https://doi.org/10.5061/dryad.vmcvdnd0d

Abstract

Wild wheat relatives of bread wheat represent genetic diversity that can be used for wheat crop improvement. Here, we establish and analyse genomic resources for Tausch’s goatgrass, Aegilops tauschii, the donor of the bread wheat D genome. We determined 493 genetically non-redundant accessions from a diversity panel of over 900 sequenced accessions. We generated high-quality assemblies for 46 accessions, including annotated chromosome-scale assemblies for one accession from each of the three lineages of Ae. tauschii to serve a reference assemblies to anchor the genomic resources. This dataset was generated under the aegis of the Open Wild Wheat Consortium (www.openwildwheat.org). We also resequenced and analysed 60 wheat landraces and generated a chromosome-scale genome assembly for one of these to study the genetic composition and history of the bread wheat D genome. We determined the complexity and origin of the D genome across 17 hexaploid wheat lines by dividing the wheat genomes into 50-kb windows and assigned each window to an Ae. tauschii subpopulation based on identity-by-state.

This dataset provides:

  1. Pseudo-chromosome level genome assemblies, Hi-C contact maps and genome annotations for the Ae. tauschii lineage-reference accessions TA10171 (L1), TA1675 (L2) and TA2576 (L3),
  2. Contig-level and lineage reference-scaffolded assemblies for 43 Ae. tauschii accessions sequenced with PacBIO CCS
  3. Pseudo-chromosome level genome assembly, Omni-C contact map and genome annotation for bread wheat landrace accession CWI 86942,
  4. Variant call (SNP) vcf file for the Ae. tauschii diversity panel. SNP were called against the TA1675 (L2) reference assembly,
  5. Phylogenetic newick tree file for the non-redundant Ae. tauschii accessions,
  6. Structural variants (SV) vcf files for Ae. tauschii accessions sequenced with PacBIO CCS. SV were called against the TA1675 (L2) reference assembly,
  7. IBSpy variations across 17 hexaploid wheat genomes using Ae. tauschii k-mer sets

README: Population genomics of the wild wheat Aegilops tauschii (Open Wild Wheat Consortium Phase II)

https://doi.org/10.5061/dryad.vmcvdnd0d

Description of the data and file structure

Wild wheat relatives of bread wheat represent genetic diversity that can be used for wheat crop improvement. Here, we establish and analyse genomic resources for Tausch’s goatgrass, Aegilops tauschii, the donor of the bread wheat D genome. We determined 493 genetically non-redundant accessions from a diversity panel of over 900 sequenced accessions. We generated high-quality assemblies for 46 accessions, including annotated chromosome-scale assemblies for one accession from each of the three lineages of Ae. tauschii to serve a reference assemblies to anchor the genomic resources. This dataset was generated under the aegis of the Open Wild Wheat Consortium (www.openwildwheat.org). We also resequenced and analysed 60 wheat landraces and generated a chromosome-scale genome assembly for one of these to study the genetic composition and history of the bread wheat D genome. We determined the complexity and origin of the D genome across 17 hexaploid wheat lines by dividing the wheat genomes into 50-kb windows and assigned each window to an Ae. tauschii subpopulation based on identity-by-state.

This dataset provides:

Pseudo-chromosome-level genome assemblies (.fasta.gz), Hi-C contact maps (.assembly.gz and .hic.gz) and gene and repeat annotations (.gff3.gz) for the Ae. tauschii lineage reference accessions TA10171 (Lineage 1), TA1675 (Lineage 2) and TA2576 (Lineage 3). The corresponding files are as follow:

  • AetTA10171_L1.Ref.genome.fasta.gz
  • AetTA1675_L2.Ref.genome.fasta.gz
  • AetTA2576_L3.Ref.genome.fasta.gz
  • AetTA10171_assembly.final.assembly.gz
  • AetTA10171_assembly.final.hic.gz
  • AetTA1675_assembly.final.assembly.gz
  • AetTA1675_assembly.final.hic.gz
  • AetTA2576_assembly.final.assembly.gz
  • AetTA2576_assembly.final.hic.gz
  • AetTA10171_v1.gff3.gz
  • AetTA1675_v1.gff3.gz
  • AetTA2576_v1.gff3.gz
  • AetTA10171_repeats_v1.gff.gz
  • AetTA10171_repeats_v1.tbl.gz
  • AetTA1675_repeats_v1.gff.gz
  • AetTA1675_repeats_v1.tbl.gz
  • AetTA2576_repeats_v1.gff.gz
  • AetTA2576_repeats_v1.tbl.gz

Primary contig-level assemblies (.hifiasm.bp.p_ctg.fa.gz)  for 43 Ae. tauschii accessions sequenced with PacBIO CCS (HiFi). The reference-based scaffolded assemblies (.scaffold.fsa.gz) were scaffolded using RagTag and the corresponding Lineage 1 (TA10171ref) or Lineage 2 (TA1675ref) reference assembly. Information on the RagTag ordering and orientations of the contigs per accession in AGP format is included in the Excel file (.xlsx). The scaffolded genomes were visually inspected through dotplots by whole genome alignment to their corresponding lineage reference (.docx). The corresponding files are:

  • TOWWC0243.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0242.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0240.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0236.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0212.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0202.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0191.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0187.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0182.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0178.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0169.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0167.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0163.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0152.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0144.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0142.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0137.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0123.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0131.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0112.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0107.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0088.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0106.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0083.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0073.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0050.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0054.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0047.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0023.hifiasm.bp.p_ctg.fa.gz
  • TOWWC0002.hifiasm.bp.p_ctg.fa.gz

  • TOWWC0111.hifiasm.bp.p_ctg.fa.gz

  • TOWWC0162.hifiasm.bp.p_ctg.fa.gz

  • TOWWC0190.hifiasm.bp.p_ctg.fa.gz

  • Tajik1040.hifiasm.bp.p_ctg.fa.gz

  • TA10097.hifiasm.bp.p_ctg.fa.gz

  • PI690713.hifiasm.bp.p_ctg.fa.gz

  • TA1618.hifiasm.bp.p_ctg.fa.gz

  • RL5271.hifiasm.bp.p_ctg.fa.gz

  • P999511.hifiasm.bp.p_ctg.fa.gz

  • AL878.hifiasm.bp.p_ctg.fa.gz

  • TOWWC0078.hifiasm.bp.p_ctg.fa.gz

  • TOWWC0031.hifiasm.bp.p_ctg.fa.gz

  • TOWWC243.TA10171ref.scaffold.fsa.gz

  • TOWWC242.TA10171ref.scaffold.fsa.gz

  • TOWWC240.TA10171ref.scaffold.fsa.gz

  • TOWWC236.TA10171ref.scaffold.fsa.gz

  • TOWWC212.TA10171ref.scaffold.fsa.gz

  • TOWWC202.TA10171ref.scaffold.fsa.gz

  • TOWWC191.TA1675ref.scaffold.fsa.gz

  • TOWWC190.TA1675ref.scaffold.fsa.gz

  • TOWWC187.TA1675ref.scaffold.fsa.gz

  • TOWWC182.TA1675ref.scaffold.fsa.gz

  • TOWWC178.TA1675ref.scaffold.fsa.gz

  • TOWWC169.TA1675ref.scaffold.fsa.gz

  • TOWWC167.TA1675ref.scaffold.fsa.gz

  • TOWWC163.TA1675ref.scaffold.fsa.gz

  • TOWWC162.TA1675ref.scaffold.fsa.gz

  • TOWWC152.TA1675ref.scaffold.fsa.gz

  • TOWWC144.TA1675ref.scaffold.fsa.gz

  • TOWWC142.TA1675ref.scaffold.fsa.gz

  • TOWWC137.TA1675ref.scaffold.fsa.gz

  • TOWWC131.TA1675ref.scaffold.fsa.gz

  • TOWWC123.TA1675ref.scaffold.fsa.gz

  • TOWWC112.TA1675ref.scaffold.fsa.gz

  • TOWWC111.TA1675ref.scaffold.fsa.gz

  • TOWWC107.TA1675ref.scaffold.fsa.gz

  • TOWWC106.TA1675ref.scaffold.fsa.gz

  • TOWWC088.TA1675ref.scaffold.fsa.gz

  • TOWWC083.TA1675ref.scaffold.fsa.gz

  • TOWWC073.TA1675ref.scaffold.fsa.gz

  • TOWWC078.TA1675ref.scaffold.fsa.gz

  • TOWWC054.TA10171ref.scaffold.fsa.gz

  • TOWWC050.TA1675ref.scaffold.fsa.gz

  • TOWWC047.TA1675ref.scaffold.fsa.gz

  • TOWWC031.TA1675ref.scaffold.fsa.gz

  • TOWWC023.TA1675ref.scaffold.fsa.gz

  • TOWWC002.TA1675ref.scaffold.fsa.gz

  • Tajik1040.TA10171ref.scaffold.fsa.gz

  • TA10097.TA10171ref.scaffold.fsa.gz

  • TA1618.TA1675ref.scaffold.fsa.gz

  • RL5271.TA1675ref.scaffold.fsa.gz

  • PI690713.TA1675ref.scaffold.fsa.gz

  • P999511.TA10171ref.scaffold.fsa.gz

  • ENT336.TA1675ref.scaffold.fsa.gz

  • AL878.TA1675ref.scaffold.fsa.gz

  • agp_files_from_ragtag_scaffold.xlsx

  • dotplots_ragtag_scaffold.docx

Pseudo-chromosome level genome assembly (.fasta.gz), pseudomolecules and Omni-C contact map (assembly.gz and hic.gz) and genome annotation (.gff3.gz) for bread wheat landrace accession CWI 86942. The corresponding files are:

  • TaesCWI86942_genome.fasta.gz

  • TaesCWI86942_assembly.final.assembly.gz

  • TaesCWI86942_assembly.omnic.hic.gz

  • TaesCWI86942_v1.gff3.gz

Variant calling (SNP) file (.vcf.gz) for the Ae. tauschii diversity panel. SNPs were called against the TA1675 (L2) reference assembly. The corresponding file is:

  • SNP_call_Aet.vcf.tar.gz

Phylogenetic newick tree file (.nwk) for 493 non-redundant Ae. tauschii accessions. The corresponding file is:

  • upgma_Aet_NonRed.nwk

Structural variants (SV) files (.vcf.gz) for Ae. tauschii accessions sequenced with PacBIO CCS (HiFi). SV were called from the HiFi reads aligned to the TA1675 (L2) reference genome assembly. The corresponding files are:

  • SV_TA1675ref.AL878.vcf.gz
  • SV_TA1675ref.PI690713.vcf.gz
  • SV_TA1675ref.ENT336.vcf.gz
  • SV_TA1675ref.TA1618.vcf.gz
  • SV_TA1675ref.RL5271.vcf.gz
  • SV_TA1675ref.TOWWC002.vcf.gz
  • SV_TA1675ref.TOWWC023.vcf.gz
  • SV_TA1675ref.TOWWC047.vcf.gz
  • SV_TA1675ref.TOWWC050.vcf.gz
  • SV_TA1675ref.TOWWC073.vcf.gz
  • SV_TA1675ref.TOWWC088.vcf.gz
  • SV_TA1675ref.TOWWC083.vcf.gz
  • SV_TA1675ref.TOWWC107.vcf.gz
  • SV_TA1675ref.TOWWC106.vcf.gz
  • SV_TA1675ref.TOWWC111.vcf.gz
  • SV_TA1675ref.TOWWC112.vcf.gz
  • SV_TA1675ref.TOWWC123.vcf.gz
  • SV_TA1675ref.TOWWC131.vcf.gz
  • SV_TA1675ref.TOWWC137.vcf.gz
  • SV_TA1675ref.TOWWC142.vcf.gz
  • SV_TA1675ref.TOWWC144.vcf.gz
  • SV_TA1675ref.TOWWC152.vcf.gz
  • SV_TA1675ref.TOWWC162.vcf.gz
  • SV_TA1675ref.TOWWC163.vcf.gz
  • SV_TA1675ref.TOWWC169.vcf.gz

  • SV_TA1675ref.TOWWC167.vcf.gz

  • SV_TA1675ref.TOWWC182.vcf.gz

  • SV_TA1675ref.TOWWC187.vcf.gz

  • SV_TA1675ref.TOWWC178.vcf.gz

  • SV_TA1675ref.TOWWC190.vcf.gz

  • SV_TA1675ref.TOWWC191.vcf.gz

  • SV_TA1675ref.TA2576.vcf.gz

  • SV_TA1675ref.Tajik1040.vcf.gz

  • SV_TA1675ref.TOWWC243.vcf.gz

  • SV_TA1675ref.TOWWC242.vcf.gz

  • SV_TA1675ref.TOWWC240.vcf.gz

  • SV_TA1675ref.TOWWC236.vcf.gz

  • SV_TA1675ref.TOWWC212.vcf.gz

  • SV_TA1675ref.TOWWC202.vcf.gz

  • SV_TA1675ref.TOWWC054.vcf.gz

  • SV_TA1675ref.TA10171.vcf.gz

  • SV_TA1675ref.TA10097.vcf.gz

  • SV_TA1675ref.P999511.vcf.gz

IBSpy variations across 17 hexaploid wheat genomes using Ae. tauschii k-mer sets (.csv and .tsv.gz). The file names begin with the wheat accession ID. The corresponding files are:

  • CWI86942_50kb_windows_predictions.csv
  • CWI86942_aetauschii_combined_queries_50000w.tsv.gz
  • PI190962_50kb_windows_predictions.csv
  • PI190962_aetauschii_combined_queries_50000w.tsv.gz
  • aikang58_50kb_windows_predictions.csv
  • aikang58_aetauschii_combined_queries_50000w.tsv.gz
  • arinalrfor_50kb_windows_predictions.csv
  • arinalrfor_aetauschii_combined_queries_50000w.tsv.gz
  • attraktion_50kb_windows_predictions.csv
  • attraktion_aetauschii_combined_queries_50000w.tsv.gz
  • cdclandmark_50kb_windows_predictions.csv
  • cdclandmark_aetauschii_combined_queries_50000w.tsv.gz
  • chinese_spring_50kb_windows_predictions.csv
  • chinese_spring_aetauschii_combined_queries_50000w.tsv.gz
  • fielder_50kb_windows_predictions.csv
  • fielder_aetauschii_combined_queries_50000w.tsv.gz
  • jagger_50kb_windows_predictions.csv
  • jagger_aetauschii_combined_queries_50000w.tsv.gz
  • julius_50kb_windows_predictions.csv
  • julius_aetauschii_combined_queries_50000w.tsv.gz
  • kariega_50kb_windows_predictions.csv
  • kariega_aetauschii_combined_queries_50000w.tsv.gz
  • longreachlancer_50kb_windows_predictions.csv
  • longreachlancer_aetauschii_combined_queries_50000w.tsv.gz
  • mace_50kb_windows_predictions.csv
  • mace_aetauschii_combined_queries_50000w.tsv.gz
  • norin61_50kb_windows_predictions.csv
  • norin61_aetauschii_combined_queries_50000w.tsv.gz
  • stanley_50kb_windows_predictions.csv
  • stanley_aetauschii_combined_queries_50000w.tsv.gz
  • symattis_50kb_windows_predictions.csv
  • symattis_aetauschii_combined_queries_50000w.tsv.gz
  • zang1817_50kb_windows_predictions.csv
  • zang1817_aetauschii_combined_queries_50000w.tsv.gz

Methods

The full methods are available in the related publication.

Funding

King Abdullah University of Science and Technology

Academy of Scientific Research and Technology

Climate Change Adaptation and Nature Conservation (GREEN FUND)

National Major Agricultural Science and Technology

National Key Research and Development Program of China

German Federal Ministry of Education and Research

Biotechnology and Biological Sciences Research Council, Designing Future Wheat Institute Strategic Programme

European Research Council

Department of Biotechnology

United States Department of Agriculture, Capacity Grant

National Institute of Food and Agriculture, Capacity Grant

Bayer, Beachell Borlaug International Scholars Program

National Science Foundation

Agricultural Research Service

Natural Environment Research Council, Independent Research Fellowship

Australian Government, Research Training Program

University of Queensland, Centennial Scholarships