Chromosome-level genome assembly, annotation and evolutionary analysis of the ornamental plant Asparagus setaceus
Data files
Nov 26, 2019 version files 24.87 MB
-
Asparagus_setaceus.gene.gff
24.87 MB
Feb 05, 2020 version files 68.77 MB
-
Asparagus_setaceus.cds
32.06 MB
-
Asparagus_setaceus.gene.gff
24.87 MB
-
Asparagus_setaceus.pep
11.84 MB
Abstract
Asparagus setaceus is a popular ornamental plant cultivated in tropical and subtropical regions globally. In this study, a chromosome-level reference genome of A. setaceus was constructed to assist in studying the genome structure and evolution. A total of 112.52 Gb of long reads was produced from Nanopore platform, resulting in 156.28× depth coverage of an estimated genome size of 720 Mb. The combination of Illumina-short reads, 10× Genomics linked reads, and Hi-C data produced the final chromosome quality genome of A. setaceus with a genome size of 710.15 Mb, accounting for 98.63% of the estimated genome size. Furthermore, 96.85% of the sequences were anchored to 10 super-scaffolds, corresponding to the 10 chromosomes. The genome of A. setaceus was predicted to have 28,410 genes, and 25,649 (90.28%) of these genes were functionally annotated. Genome annotation revealed that 65.59% were repetitive sequences, of which long terminal repeats were predominant (42.51% of the whole genome). Divergence between A. setaceus and its close relative A. officinalis is estimated to have occurred ~9.66 million years ago. Genome evolution analysis indicated that A. setaceus underwent two rounds of whole genome duplication events. In addition, 762 specific gene families, 898 expanded gene families, 96 positively selected genes, and 76 resistance R genes in A. setaceus were identified and functionally annotated. These findings provided insights into the structure and evolution of the A. setaceus genome and will facilitate the comparative genetic and genomic research on the genus Asparagus.
Methods
Protein-coding genes were predicted using strategies that integrated de novo gene prediction, experimental evidence obtained from transcriptomic data, and homology-based gene models. For homology prediction, GeMoMa was used with a protein sequence from A. officinalis, a relative species of A. setaceus. For RNA-seq-based prediction, PASA was used on the basis of the assembled RNA-seq unigenes. Augustus was used for de novo prediction. Then, genes predicted by various methods were integrated into a non-redundant gene set by using EVM. For the annotation of the protein-coding genes, the sequences of the predicted genes were searched against the commonly used databases SwissProt, GO, KEGG, KOG, Nr, and InterPro.