Data from: Genome assembly of a diversity panel of Chenopodium quinoa
Data files
Oct 23, 2024 version files 3.49 GB
-
Final_0321072RM.tar.gz
436.04 MB
-
Final_CHEN199.tar.gz
438.75 MB
-
Final_CHEN90.tar.gz
431.58 MB
-
Final_D10126.tar.gz
435.49 MB
-
Final_D12282.tar.gz
440.48 MB
-
Final_Javi.tar.gz
442.60 MB
-
Final_PI614919.tar.gz
428.17 MB
-
Final_Regalona.tar.gz
430.59 MB
-
Quinoa_panEDTA.TElib.fa.tar.gz
3.53 MB
-
README.md
6.03 KB
-
tidk-plots.zip
1.31 MB
Abstract
Quinoa is an important crop for the future challenges of food and nutrient security in the context of climate changes in developing countries. Deep characterization of the genetic diversity of quinoa germplasm at both genetic and genomic levels is needed to support quinoa agronomical improvement and adaptation following its worldwide cultivation expansion. In this study, we report the construction of chromosome-scale genome assemblies of eight C. quinoa accessions covering the spread of phenotypic and genetic diversity of both Lowland and Highland quinoas. The assemblies were produced from a combination of PacBio HiFi reads and Bionano Saphyr optical maps, with total assembly sizes averaging 1.28 Gb with an average N50 of 71.1 Mb. Between 43,733 and 48,564 gene models were predicted for the eight new quinoa genomes, and on average, about 66% of each quinoa genome was classified as repetitive sequences. Alignment between the eight genome assemblies was performed and allowed the identification of structural rearrangements including inversion, translocation, and duplication. In summary, these eight novel C. quinoa genome assemblies provide a resource for association genetics, comparative genomics, and pan-genome analyses for the discovery of genetic components and variations underlying agriculturally important traits.
README: Genome assembly of a diversity panel of Chenopodium quinoa
https://doi.org/10.5061/dryad.zkh1893jj
Brief summary
Here we present the genome assembly and annotation of a panel of eight quinoa accessions of diverse geographical origins. They were selected to represent the diversity of phenotypes (growth habit, panicle architecture, leaf shape, stem and seed colors) and for performing well in hot, dry and short-days environments.
The eight quinoa accessions include three Lowland (REGALONA acc. KAUST-09303 and JAVI acc. KAUST-09307 from Chile; D-12282 acc. KAUST-09300 from Argentina) and five Highland genotypes (CHEN-199 acc. KAUST-09372 and PI-614919 acc. KAUST-09363 from Bolivia; 03-21-072RM acc. KAUST-09370, D-10126 acc. KAUST-09362 and CHEN-90 acc. KAUST-09367 from Peru).
All eight genomes are high-quality chromosome-scale reference sequences assembled from over 30x genome coverage of PacBio HiFi long-reads further validated by Bionano Saphyr optical maps.
The annotation of the repeat elements across the 8 quinoa genomes was performed using panEDTA with the REPET curated library provided for QQ74-V2 as input.
The gene models were predicted using a combination of lifting and genome-guided approach supported by IsoSeq dataset for Regalona and 0321072RM accessions, and a lifted approach from a merged and curated annotation between Regalona, 0321072RM and QQ74-V2 for the other 6 accessions.
Description of the data and file structure
Detailed list of files in '.tar.gz' folders:
Final_Regalona.tar.gz
-->Regalona.fasta.mod.EDTA.TEanno.gff3 (Repeat annotation performed with EDTA using the REPET curated library from QQ74-V2 reference genome)
-->Regalona_v1.gff3 (Gene annotation: gene, mRNA, exon and CDA coordinates)
-->Regalona_v1.fasta (Genome assembly: pseudomolecules and unanchored (UA) contigs)
-->Regalona_v1-prot.fasta (Peptide sequences: CDS sequences translated into Amino acid)
-->Regalona_v1-cds.fasta (CDS sequences: transcribed sequence, devoid of introns, and devoid of UTRs)
Final_PI614919.tar.gz
-->PI614919.fasta.mod.EDTA.TEanno.gff3 (Repeat annotation performed with EDTA using the REPET curated library from QQ74-V2 reference genome)
-->PI614919_v1.gff3 (Gene annotation: gene, mRNA, exon and CDA coordinates)
-->PI614919_v1.fasta (Genome assembly: pseudomolecules and unanchored (UA) contigs)
-->PI614919_v1-prot.fasta (Peptide sequences: CDS sequences translated into Amino acid)
-->PI614919_v1-cds.fasta (CDS sequences: transcribed sequence, devoid of introns, and devoid of UTRs)
Final_Javi.tar.gz
-->Javi.fasta.mod.EDTA.TEanno.gff3 (Repeat annotation performed with EDTA using the REPET curated library from QQ74-V2 reference genome)
-->Javi_v1.gff3 (Gene annotation: gene, mRNA, exon and CDA coordinates)
-->Javi_v1.fasta (Genome assembly: pseudomolecules and unanchored (UA) contigs)
-->Javi_v1-prot.fasta (Peptide sequences: CDS sequences translated into Amino acid)
-->Javi_v1-cds.fasta (CDS sequences: transcribed sequence, devoid of introns, and devoid of UTRs)
Final_D12282.tar.gz
-->D12282.fasta.mod.EDTA.TEanno.gff3 (Repeat annotation performed with EDTA using the REPET curated library from QQ74-V2 reference genome)
-->D12282_v1.gff3 (Gene annotation: gene, mRNA, exon and CDA coordinates)
-->D12282_v1.fasta (Genome assembly: pseudomolecules and unanchored (UA) contigs)
-->D12282_v1-prot.fasta (Peptide sequences: CDS sequences translated into Amino acid)
-->D12282_v1-cds.fasta (CDS sequences: transcribed sequence, devoid of introns, and devoid of UTRs)
Final_D10126.tar.gz
-->D10126.fasta.mod.EDTA.TEanno.gff3 (Repeat annotation performed with EDTA using the REPET curated library from QQ74-V2 reference genome)
-->D10126_v1.gff3 (Gene annotation: gene, mRNA, exon and CDA coordinates)
-->D10126_v1.fasta (Genome assembly: pseudomolecules and unanchored (UA) contigs)
-->D10126_v1-prot.fasta (Peptide sequences: CDS sequences translated into Amino acid)
-->D10126_v1-cds.fasta (CDS sequences: transcribed sequence, devoid of introns, and devoid of UTRs)
Final_CHEN199.tar.gz
-->CHEN199.fasta.mod.EDTA.TEanno.gff3 (Repeat annotation performed with EDTA using the REPET curated library from QQ74-V2 reference genome)
-->CHEN199_v1.gff3 (Gene annotation: gene, mRNA, exon and CDA coordinates)
-->CHEN199_v1.fasta (Genome assembly: pseudomolecules and unanchored (UA) contigs)
-->CHEN199_v1-prot.fasta (Peptide sequences: CDS sequences translated into Amino acid)
-->CHEN199_v1-cds.fasta (CDS sequences: transcribed sequence, devoid of introns, and devoid of UTRs)
Final_CHEN90.tar.gz
-->CHEN90.fasta.mod.EDTA.TEanno.gff3 (Repeat annotation performed with EDTA using the REPET curated library from QQ74-V2 reference genome)
-->CHEN90_v1.gff3 (Gene annotation: gene, mRNA, exon and CDA coordinates)
-->CHEN90_v1.fasta (Genome assembly: pseudomolecules and unanchored (UA) contigs)
-->CHEN90_v1-prot.fasta (Peptide sequences: CDS sequences translated into Amino acid)
-->CHEN90_v1-cds.fasta (CDS sequences: transcribed sequence, devoid of introns, and devoid of UTRs)
Final_0321072RM.tar.gz
-->0321072RM.fasta.mod.EDTA.TEanno.gff3 (Repeat annotation performed with EDTA using the REPET curated library from QQ74-V2 reference genome)
-->0321072RM_v1.gff3 (Gene annotation: gene, mRNA, exon and CDA coordinates)
-->0321072RM_v1.fasta (Genome assembly: pseudomolecules and unanchored (UA) contigs)
-->0321072RM_v1-prot.fasta (Peptide sequences: CDS sequences translated into Amino acid)
-->0321072RM_v1-cds.fasta (CDS sequences: transcribed sequence, devoid of introns, and devoid of UTRs)
Additional files:
-->Quinoa_panEDTA.TElib.fa.tar.gz (Consolidated repeat elements library produced by panEDTA from all 8 quinoa accessions + QQ74-V2)
-->tidk-plots.zip (Output plots produced by tidk (v.0.2.31) (Brown et al. 2023) to detect pics of telomeric repeats in the assemblies)
Methods
Here we present the genome assembly and annotation of a panel of eight quinoa accessions of diverse geographical origins selected to represent the diversity of quinoa phenotypes and to support the genetic improvement and adaptation of quinoa to warm and arid environments. All eight genomes are high-quality chromosome-scale reference sequences assembled from over 30x genome coverage of PacBio HiFi long-reads further validated by Bionano Saphyr optical maps. We further provide a map of structural rearrangements (inversion, duplication, and translocation) between the eight genomes. Altogether, these datasets represent important resources for the investigation of the genetic components underlying important agronomical traits for quinoa improvement to meet the challenges of domestication and adaptation to its novel environments of cultivation.