Skip to main content

Long-read genome sequencing of bread wheat facilitates disease resistance gene cloning

Cite this dataset

Athiyannan, Naveenkumar et al. (2021). Long-read genome sequencing of bread wheat facilitates disease resistance gene cloning [Dataset]. Dryad.


Cloning agronomically important genes from large, complex crop genomes remains challenging. Here, we generate a 14.7-gigabase chromosome-scale assembly of the South African bread wheat (Triticum aestivum) cultivar Kariega by combining high-fidelity long reads, optical mapping, and chromosome conformation capture. The resulting assembly is an order of magnitude more contiguous than previous wheat assemblies. Kariega shows durable resistance against the devastating fungal stripe rust disease. We identified the race-specific disease resistance gene Yr27, encoding an intracellular immune receptor, as a major contributor to this resistance. Yr27 is allelic to the leaf rust resistance gene Lr13, with the Yr27 and Lr13 proteins sharing 97% sequence identity. Our results thus demonstrate the feasibility of generating chromosome-scale wheat assemblies to clone genes and also exemplify that highly similar alleles of a single-copy gene can confer resistance to different pathogens, which might provide a basis for engineering Yr27 alleles with multiple recognition specificities in future.

Usage notes

The folder 'Kariega_v1_pseudomolecules.tar.gz' contains DNA sequence file and masked sequences in FASTA format for chromosomal pseudomolecules of bread wheat (Triticum aestivum) cv. Kariega. Primary contig assembly from PacBio Hifi reads was done with HiFiAsm. Contigs were scaffolded with Bionano data and arranged into chromosomal pseudomolecules with Omni-C data using the Juicer / 3D-DNA / Juicebox pipeline. An AGP file specifying the placement of sequence scaffolds in the pseudomolecules is provided. 
The folder 'Kariega_v1_annotations.tar.gz' holds the structural gene annotation based on evidences derived from protein homology, RNAseq and IsoSeq datasets, as well as ab initio gene prediction incorporating TE annotations: gene models in GFF3 format, their functional descriptions as well as coding and protein sequences of high- and low-confidence genes. De novo annotations were subject of a confidence classification step, which includes homology to existing proteins, robustness of functional assessment and results in high- (HC) and low- (LC) confidence genes. A fixed cut-off threshold was applied for protein homology and functional assessment was carried out using an Interproscan pipeline and was not manually curated.
A GFF file containing the NLR genes predicted with NLR-Annotator pipeline. It contains also the GFF files specifying the positions of transposable elements and a fasta file with a transposable elements library annotated with EDTA software. 
The Yr27 CDS and genomic sequences are present in FASTA format.

Provided in these folders:
- Umasked and Masked DNA sequences
- Annotation in gff3 format including isoforms, UTRs and description line putative functional assignments
- separate gff3 files for high- (HC)/ low- (LC) confidence genes and transposable elements (TE)
- CDS and protein sequences
- Transposable elements library
- NLR annotation
- Yr27 CDS and genomic sequences

Detailed list of files in '.tar.gz' folders:
--> Kariega_v1.fasta (Unmasked pseudomolecules)
--> Kariega_v1-masked.fasta (masked pseudomolecules)
--> Kariega_v1.agp (scaffold order in each pseudomolecules)
--> Kariega_v1.length (pseudomolecule size)

(High- and Low-confidence gene models)
--> Kariega_v1.gff3
--> Kariega_v1-cds.fasta
--> Kariega_v1-prot.fasta

(High-confidence gene models)
--> Kariega_v1_HC.gff3
--> Kariega_v1_HC-cds.fasta
--> Kariega_v1_HC-prot.fasta

(Low-confidence gene models)
--> Kariega_v1_LC.gff3
--> Kariega_v1_LC-cds.fasta
--> Kariega_v1_LC-prot.fasta

(NLR gene prediction)
--> Kariega_v1_NLR.gff3

(Transposable elements)
--> Kariega_v1_TE-intact.gff3
--> Kariega_v1_TE-lib.fasta