Genome annotation of humpback grouper (Cromileptes altivelis)
Cite this dataset
Yang, Yang (2021). Genome annotation of humpback grouper (Cromileptes altivelis) [Dataset]. Dryad. https://doi.org/10.5061/dryad.wstqjq2hq
Abstract
Humpback grouper (Cromileptes altivelis), an Epinephelidae species, is patchily distributed in reef habitats of Western Pacific water. This grouper possesses a remarkably different body shape and notably low growth rates compared to closely related grouper species. Up to now, the evolutionary status of humpback grouper is ambiguous in grouper species. In order to promote further research of the grouper, in the present study, a high-quality chromosome-level genome of humpback grouper was assembled using PacBio sequencing and high-throughput chromatin conformation capture (Hi-C) technology.
Methods
Repetitive regions of the wild humpback grouper genome were identified by de novo and homology prediction. Transposable elements (TEs) were identified using LTR Finder version 1.05 (http://tlife.fudan.edu.cn/tlife/ltr_finder/), RepeatScout v1.0.5 (http://www.repeatmasker.org), and PILER v1.0. The TEs were classified and annotated using PASTEClassifier version 1.0 using TEdenovo pipeline. Protein-coding genes were predicted using three methods i.e., de novo prediction, homologous sequences prediction and RNA-seq assisted methods. For de novo prediction, the genome without repeat regions was applied to generate gene structures using Genscan, Augustus version 2.4, GlimmerHMM version 3.0.4, GeneID version 1.4, SNAP. A total of 1588,939 repeat sequences were predicted with 374.83 Mb, which account for 36.99% of genome of humpback grouper. In addition, a total of 26,037 protein-coding genes were predicted, among them 25,243 (96.95%) genes could be functionally annotated.