Skip to main content

Genome assembly and annotations of Yoshiicerus persimilis

Cite this dataset

Zhang, Feng; Jin, Jianfeng (2021). Genome assembly and annotations of Yoshiicerus persimilis [Dataset]. Dryad.


Gene duplication is an important mechanism for organism evolution and adaptation. Whole-genome duplication (WGD) is considered as a major driver of species diversification. It has been reported that ancient WGD events occurred during the evolution of basal hexapod; the inferred pattern based on gene-tree and molecular-clock analyses remains challenging. Therefore, the lack of chromosome-scale genome has limited the characterization of WGD directly upon the synteny evidence. In order to get direct evidence from intra-chromosome synteny, we constructed the first chromosome-level genome of basal hexapod (Yoshiicerus persimilis, Collembola). We observed only large-scale gene duplications in chromosome 1, but not within/among other five chromosomes. These duplications were involved in proliferation and growth.


The chromosome-level genome was assembled using PacBio long reads and Hi-C data. Protein-coding genes (PSGs) were predicted under three lines of evidence including ab initio, RNA-seq and protein homology, and all the above evidence was integrated by Maker genome annotation pipeline. The repetitive sequences of the genome was constructed using RepeatModeler, Repbase and RepeatMasker. Non-coding RNAs, including rRNAs, snRNAs and miRNAs were identified by aligning the genome of Yoshiicerus persimilis to the Rfam database. The tRNAs were predicted using tRNAscan-SE. Motifs and domains in the PCGs were retrieved by performing InterProScan. The software eggNOG-mapper was also used for functional category annotation of the predicted PCGs based on the eggNOG database.

Usage notes

The readme file contains an explanation of each of genome annotation files in the dataset. Information on how the results were obtained could be found in the associated manuscript referenced above.