A chromosome-scale de novo genome assembly of the dwarf tomato variety Micro-Tom
Data files
Oct 15, 2023 version files 940.67 MB
-
MT_denovo_genome.fasta
-
README.md
Abstract
The cultivated tomato (Solanum lycopersicum) is an important crop and model species for genetics and plant molecular biology research. The dwarf tomato variety Micro-Tom is used extensively in research because it is rapid flowering, easy to grow in high volumes in minimal space, and is amenable to genetic transformation. Here we provide a de novo chromosome-scale genome assembly of Micro-Tom that was generated using PacBio HiFi reads and scaffolded using chromosome confirmation capture data. The HiFi data was assembled using the Hifiasm assembler and OmniC data was used for scaffolding using Salsa and several rounds of manual curation and validation.
README: A chromosome-scale de novo genome assembly of the dwarf tomato variety Micro-Tom
https://doi.org/10.5061/dryad.h9w0vt4qd
We sequenced DNA from the dwarf tomato variety Micro-Tom using PacBio HiFi technology and generated an initial assembly by running hifiasm. We then generated chromosome confirmation capture data (Omni-C, Dovetail) and used it for the automated scaffolding of the hifiasm-assembly output using the Salsa2 pipeline. Multiple rounds of manual scaffolding correction were performed to further improve the assembly and resulted in a chromosome-scale Micro-Tom genome assembly. This chromosome-scale assembly was validated by checking assembly completeness (and QV) through k-mer analysis, repeat content was assessed by computing the LTR assembly index (LAI), and BUSCO (Benchmarking Universal Single-Copy Orthologs) was used to estimate gene completeness.
## Description of the data and file structure
The S. lycopersicum cv. Micro-Tom genome assembly (multi .fasta file) contains 12 chromosome level sequences and 2688 un-placed sequences.
The 12 chromosomes are named: MT-ch01 MT-ch02 MT-ch03 MT-ch04 MT-ch05 MT-ch06 MT-ch07 MT-ch08 MT-ch09 MT-ch10 MT-ch11 MT-ch12
The un-placed sequences, named MT_unplaced-01 to MT_unplaced_2688, largely represent chloroplast, mitochondrial, rDNA and satellite repeat derived sequences.
Organism name: Solanum lycopersicum
Cultivar name: Micro-Tom
Assembly level: Chromosome
Assembler: Hifiasm
Scaffolding: Salsa2 + manual curation
Sequencing technology: PacBio HiFi and Dovetail Omni-C
Submitter: Max Planck Institute for Plant Breeding Research (Cologne, Germany)
## Sharing/Access information
The raw sequencing data (PacBio HiFi reads and OmniC reads) are deposited at the European Nucleotide Archive (https://www.ebi.ac.uk/ena/) under project number PRJEB62441.
## Code/Software
Hifiasm v0.16.1 was used to assemble Micro-Tom PacBio HiFi data and generate a contig-level assembly.
The contig-level assembly was scaffolded automatically using Salsa2 v2.2 and the esrice/hic-pipeline (https://github.com/esrice/hic-pipeline)
Multiple rounds of manual fine-tuning and scaffolding correction were performed, based on Hi-C interaction, and resulted in a chromosome-scale assembly.
MD5sum of MT_denovo_genome.fasta:
43ead624f5532373679ec62c8cd185cf
<br>
Methods
The raw PacBio HiFi sequencing reads and OmniC reads of the Micro-Tom variety are available via the European Nucleotide Archive under project number PRJEB62441. A gene annotation file is available upon request.
For scientific correspondence please contact Charles Underwood (cunderwood[@]mpipz.mpg.de).