High‑fidelity genome assembly of wheat introgression line Zahir‑1644
Data files
Mar 11, 2025 version files 14.61 GB
-
README.md
3.34 KB
-
zahir_1644_HiFi_assembly.fasta
14.61 GB
Abstract
This dataset provides a high‑fidelity genome assembly of the wheat introgression line Zahir‑1644, generated using PacBio Sequel II Hi‑Fi reads and assembled with hifiasm (v0.19). The assembly, which has a contig N50 of 21 Mb, was quality‐assessed using QUAST (v5.2) and BUSCO (v5.1) against the ‘poales’ lineage, demonstrating high completeness and contiguity. These data support studies on wheat disease resistance mechanisms and genetic improvement. Included files comprise the raw sequencing reads, the assembled genome in FASTA format, and associated assembly metrics.
https://doi.org/10.1101/2024.08.30.610287
Submitter Contact Information
Name: Karthick Gajendiran
Institution: King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Email: karthick.gajendiran@kaust.edu.sa
Alternate Contact Information
Name: Brande Wulff
Institution: King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Email: brande.wulff@kaust.edu.sa
Dataset Overview
This dataset comprises the high‑fidelity genome assembly of the wheat introgression line Zahir‑1644, as described in the preprint "A wheat tandem kinase sensor activates an NLR helper to trigger immunity". The genome assembly is intended to support research in wheat genetics, particularly studies on disease resistance mechanisms.
Key highlights include:
Sequencing Platform: PacBio Sequel II Hi‑Fi reads.
Assembly Software: hifiasm (v0.19) used with default parameters.
Quality Metrics:
Contig N50 of approximately 21 Mb.
Quality assessment performed using QUAST (v5.2) and BUSCO (v5.1) with the poales dataset.
Potential Applications: The dataset provides a valuable resource for genomic, genetic improvement, and comparative studies in wheat and related species.
Experimental Methods
Plant Material and DNA Extraction
Sample Source: Wheat introgression line Zahir‑1644.
Collection: Tissue samples were collected from young plants and flash-frozen in liquid nitrogen.
DNA Extraction: High‑quality genomic DNA was extracted using standardized protocols suitable for long‑read sequencing.
Genome Sequencing and Assembly
Sequencing:
Sequenced on the PacBio Sequel II platform to generate Hi‑Fi reads.
Raw reads were processed to obtain high‑quality Circular Consensus Sequences (CCS).
Assembly:
Genome assembly was performed using hifiasm (v0.19) without prior error correction.
Assembly was executed on a high‑performance computing cluster (e.g., 1 node, 45 tasks per CPU, 500 GB memory).
Quality Assessment:
Assembly metrics were generated using QUAST (v5.2).
Genome completeness was evaluated with BUSCO (v5.1) using the poales lineage dataset.
File Structure and Contents
The dataset is organized as follows:
Zahir-1644_genome_assembly.fasta: The primary genome assembly in FASTA format.
Software and Environment
hifiasm v0.19 – Genome assembly.
Additional Tools:
Standard bioinformatics tools were used for data processing and analysis on a high-performance computing cluster.
Data Usage and Reuse
This dataset is made available under a CC0 1.0 Universal Public Domain Dedication license. Researchers are encouraged to use the data for further genomic analyses, comparative studies, and wheat breeding research. Please cite the associated preprint when utilizing this dataset in your research.
Recommended Citation:
Chen., et al. (2024). A wheat tandem kinase sensor activates an NLR helper to trigger immunity. bioRxiv. https://doi.org/10.1101/2024.08.30.610287
Acknowledgments
The authors thank all collaborators and technical staff for their contributions.
Genomic DNA was extracted from the leaf tissues of Zahir-1644 introgressed line and sequenced through PacBio Hi-Fi sequencing platform. The raw reads were processed and then assembled using Hifiasm genome assembler.