A chromosome-scale assembly of the quinoa genome provides insights into the structure and dynamics of its subgenomes
Data files
Oct 31, 2023 version files 1.27 GB
-
Cpallidicaule_v2_pseudomoleculesANDannotations.tar.gz
-
Cq_allpseudo_biallelic_minDP5_MaxMissing0.8_MAF0.01.Cq3Bsamples.variantsonly.vcf.tar.gz
-
Cquinoa_QQ74_v2_pseudomoleculesANDannotations.tar.gz
-
Csuecicum_v2_pseudomoleculesANDannotations.tar.gz
-
README.txt
Abstract
Quinoa (Chenopodium quinoa Willd.) is an allotetraploid seed crop with the potential to help address global food security concerns. Genomes have been assembled for three accessions of quinoa; however, all assemblies are fragmented and do not reflect known chromosome biology. Here, we used in vitro and in vivo Hi-C data to produce a chromosome-scale assembly of the Chilean quinoa accession PI 614886 (QQ74). The final assembly spanned 1.326 Gb, of which 90.5% was assembled into 18 chromosome-scale scaffolds. The genome was annotated with 54,499 protein-coding genes, 97% of which were located on the 18 largest scaffolds. We also produced an updated genome assembly for the B-genome diploid C. suecicum and used it, together with the A-genome diploid C. pallidicaule, to identify genomic rearrangements within the quinoa genome, including a large pericentromeric inversion representing 71.7% of chromosome Cq3B. Repetitive sequences comprise 65.20%, 48.61%, and 57.91% of the quinoa, C. pallidicaule, and C. suecicum genomes, respectively. Evidence suggests that the B subgenome is more dynamic and has expanded more than the A subgenome. These genomic resources will enable more accurate assessments of genome evolution within the Amaranthaceae and will facilitate future efforts to identify variation in genes underlying important agronomic traits in quinoa.
Usage notes
Detailed list of files in '.tar.gz' folders:
Cquinoa_QQ74_v2_pseudomoleculesANDannotations.tar.gz
-->Cquinoa_QQ74_v2_CDS.fasta (CDS sequences: transcribed sequence, devoid of introns, and devoid of UTRs)
-->Cquinoa_QQ74_v2.fasta (Pseudomolecules and unanchored contigs)
-->Cquinoa_QQ74_v2.gff3 (Gene annotation, including gene, mRNA, CDS, 3' and 5' UTRs)
-->Cquinoa_QQ74_v2_mRNA.fasta (mRNA sequences: transcribed sequence, devoid of introns, but containing UTRs)
-->Cquinoa_QQ74_v2_prot.fasta (Peptide sequences: CDS sequences translated into Amino acid)
-->Cquinoa_QQ74_v2_REPET_classification.txt (TE classification: produced with REPET annotation software)
-->Cquinoa_QQ74_v2_REPET_consensus.fasta (TE consensus sequences: produced with REPET annotation software)
-->Cquinoa_QQ74_v2_REPET.gff3 (TE annotation: performed with REPET software)
Csuecicum_v2_pseudomoleculesANDannotations.tar.gz
-->Csuecicum_v2.fasta (Pseudomolecules and uncanchored condigs)
-->Csuecicum_v2.gff3 (Gene annotation, including gene, mRNA, CDS, 3' and 5' UTRs and tRNA)
-->Csuecicum_v2_mRNA.fasta (mRNA sequences: transcribed sequence, devoid of introns, but containing UTRs)
-->Csuecicum_v2_prot.fasta (Peptide sequences: CDS sequences translated into Amino acid)
-->Csuecicum_v2_REPET_classification.txt (TE classification: produced with REPET annotation software)
-->Csuecicum_v2_REPET_consensus.fasta (TE consensus sequences: produced with REPET annotation software)
-->Csuecicum_v2_REPET.gff3 (TE annotation: performed with REPET software)
Cpallidicaule_v2_pseudomoleculesANDannotations.tar.gz
-->Cpallidicaule_v2.fasta (Pseudomolecules and unanchored contigs)
-->Cpallidicaule_v2.gff3 (Gene annotation, including gene, mRNA, CDS, 3' and 5' UTRs and tRNA)
-->Cpallidicaule_v2_mRNA.fasta (mRNA sequences: transcribed sequence, devoid of introns, but containing UTRs)
-->Cpallidicaule_v2_prot.fasta (Peptide sequences: CDS sequences translated into Amino acid)
-->Cpallidicaule_v2_REPET_classification.txt (TE classification: produced with REPET annotation software)
-->Cpallidicaule_v2_REPET_consensus.fasta (TE consensus sequences: produced with REPET annotation software)
-->Cpallidicaule_v2_REPET.gff3 (TE annotation: performed with REPET software)
Cq_allpseudo_biallelic_minDP5_MaxMissing0.8_MAF0.01.Cq3Bsamples.variantsonly.vcf.tar.gz (SNPs variants among 209 quinoas and wilds called against Cquinoa_QQ74_v2.fasta pseudomolecules: only biallelic variants with minimum read depth 5, maximum missing data 20% and MAF 0.01% were retained)