Data from: Chromosome-level genome assembly of a cyprinid fish Onychostoma macrolepis by integration of Nanopore Sequencing, Bionano and Hi-C technology
Sun, Lina et al. (2020), Data from: Chromosome-level genome assembly of a cyprinid fish Onychostoma macrolepis by integration of Nanopore Sequencing, Bionano and Hi-C technology, Dryad, Dataset, https://doi.org/10.5061/dryad.r7sqv9s8g
Onychostoma macrolepis is an emerging commercial cyprinid fish species. It is a model system for studies of sexual dimorphism and genome evolution. Here, we report the chromosome-level assembly of the O.macrolepis genome obtained from the integration of Nanopore long-read sequencing with physical maps produced using Bionano and Hi-C technology. A total of 87.9 Gb of Nanopore sequence provided approximately 100-fold coverage of the genome. The preliminary genome assembly was 883.2 Mb in size with a contig N50 size of 11.2 Mb. The 969 corrected contigs obtained from Bionano optical mapping were assembled into 853 scaffolds and produced an assembly of 886.5 Mb with a scaffold N50 of 16.5 Mb. Finally, using the Hi-C data, 881.3 Mb (99.4% of genome) in 526 scaffolds were anchored and oriented in 25 chromosomes ranging in size from 25.27 to 56.49 Mb. In total, 24,770 protein-coding genes were predicted in the genome, and ~96.85% of the genes were functionally annotated. The annotated assembly contains 93.3% complete genes from the BUSCO reference set. In addition, we identified 409 Mb (46.23% of the genome) of repetitive sequence, and 11,213 non-coding RNAs, in the genome. Evolutionary analysis revealed that O.macrolepis diverged from common carp approximately 24.25 million years ago. The chromosomes of O.macrolepis showed an unambiguous correspondence to the chromosomes of zebrafish. The high-quality genome assembled in this work provides a valuable genomic resource for further biological and evolutionary studies of O. macrolepis.
The prediction of protein-coding genes was performed using a combination of the homology-based, de novo based and RNA sequences-based gene methods. The annotation of repetitive sequences was constructed by RepeatModeler, Repbase and RepeatMasker.Non-coding RNAs, including rRNAs, snRNAs, and miRNAs were identified by aligning Onychostoma macrolepis genome to the Rfam database. The tRNAs were predicted using tRNAscan-SE. The rRNAs and their subunits were predicted using RNAmmer.
National Natural Science Foundation of China, Award: 316,300,823,157,260,000,000,000,000