Construction of a chromosome-scale long-read reference genome assembly for potato
Data files
Aug 19, 2020 version files 2.88 GB
-
DM_1-3_516_R44_potato_genome_assembly.v6.1.embryophyta_odb10.busco.txt
547 B
-
DM_1-3_516_R44_potato_genome_assembly.v6.1.fa
753.95 MB
-
DM_1-3_516_R44_potato_genome_assembly.v6.1.hm.fa
756.42 MB
-
DM_1-3_516_R44_potato_genome_assembly.v6.1.hm.out.gff
106.09 MB
-
DM_1-3_516_R44_potato_genome_assembly.v6.1.hm.tbl
2.44 KB
-
DM_1-3_516_R44_potato_genome_assembly.v6.1.sm.fa
756.42 MB
-
DM_1-3_516_R44_potato.v6.1.final_crl.fa
37.33 MB
-
DM_1-3_516_R44_potato.v6.1.hc_gene_models.cdna.fa
74 MB
-
DM_1-3_516_R44_potato.v6.1.hc_gene_models.cds.fa
55.90 MB
-
DM_1-3_516_R44_potato.v6.1.hc_gene_models.embryophyta_odb10.busco.txt
560 B
-
DM_1-3_516_R44_potato.v6.1.hc_gene_models.gff3
48.90 MB
-
DM_1-3_516_R44_potato.v6.1.hc_gene_models.pep.fa
19.31 MB
-
DM_1-3_516_R44_potato.v6.1.README.pdf
45.46 KB
-
DM_1-3_516_R44_potato.v6.1.repr_hc_gene_models.gff3
32.29 MB
-
DM_1-3_516_R44_potato.v6.1.repr_hc_gene_models.list.txt
1.45 MB
-
DM_1-3_516_R44_potato.v6.1.repr_hc_gene_models.pep.fa
13.84 MB
-
DM_1-3_516_R44_potato.v6.1.working_models.cdna.fa
79.57 MB
-
DM_1-3_516_R44_potato.v6.1.working_models.cds.fa
61.18 MB
-
DM_1-3_516_R44_potato.v6.1.working_models.embryophyta_odb10.busco.txt
560 B
-
DM_1-3_516_R44_potato.v6.1.working_models.func_anno.txt
3.01 MB
-
DM_1-3_516_R44_potato.v6.1.working_models.gff3
56.76 MB
-
DM_1-3_516_R44_potato.v6.1.working_models.pep.fa
21.19 MB
Abstract
Background: Worldwide, the cultivated potato, Solanum tuberosum L., is the number one vegetable crop and a critical food security crop. The genome sequence of DM1-3 516 R44, a doubled monoploid clone of S. tuberosum Group Phureja, was published in 2011 using a whole-genome shotgun sequencing approach with short read sequence data. Current advanced sequencing technologies now permit generation of near-complete, high-quality chromosome-scale genome assemblies at a minimal cost.
Findings: Here, we present an updated version of the DM1-3 516 R44 genome sequence (v6.1) using Oxford Nanopore Technologies long reads coupled with proximity-by-ligation scaffolding (Hi-C) yielding a chromosome-scale assembly. The new (v6.1) assembly represents 741.6 Mb of sequence (87.8 %) of the estimated 844 Mb genome, of which, 741.5 Mb is non-gapped with 731.2 Mb anchored to the 12 chromosomes. Use of Oxford Nanopore Technologies full-length cDNA sequencing enabled annotation of 32,917 high-confidence protein-coding genes encoding 44,851 gene models that had a significantly improved representation of conserved orthologs compared to the previous annotation. The new assembly has improved contiguity with a 595-fold increase in N50 contig size, 99% reduction in the numbersof contigs, a 44-fold increase in N50 scaffold size, and an LTR Assembly Index score of 13.56, placing it in the category of reference genome quality. The improved assembly also permitted annotation of the centromeres via alignment to sequencing reads derived from CENH3 nucleosomes.
Conclusions: Access to advanced sequencing technologies and improved software permitted generation of a high-quality, long-read, chromosome-scale assembly and improved annotation dataset for the reference genotype of potato that will facilitate research aimed at improving agronomic traits and understanding genome evolution.
See attached DM_1-3_516_R44_potato.v6.1.README.pdf for additional information.