Genomic features for adaptation and evolutionary dynamics of four Asian domestic carps
Data files
Oct 06, 2023 version files 3.89 GB
-
bighead_carp.genome.fasta
-
bighead_carp.gff
-
bighead_carp.pep.fasta
-
black_carp.genome.fasta
-
black_carp.gff
-
black_carp.pep.fasta
-
grass_carp.genome.fasta
-
grass_carp.gff
-
grass_carp.pep.fasta
-
README.md
-
silver_carp.genome.fasta
-
silver_carp.gff
-
silver_carp.pep.fasta
Abstract
The four major Asian domestic carps, namely grass carp, black carp, bighead carp, and silver carp, belonging to Cypriniformes, and are among the most important aquaculture species and sources of animal protein in China. they have similar habitats, closely phylogenetic relationships, and large body sizes. However, they differ in their diet preferences, behavior, and physical traits. Here, to better understand their evolution, we generated the chromosome-level genomes of the four domestic carps. We uploaded the assembled genomes of four carps, the gene annotation files in gff format of four carps, the protein sequences files of four carps, and the analysis code or pipeline used in the article, and the readme file. This study shed light on the genomic bases driving species divergence and adaptation, providing valuable insights for future research in this field.
README: Genomic features for adaptation and evolutionary dynamics of four Asian domestic carps
https://doi.org/10.5061/dryad.4qrfj6qgm
Description of the data and file structure
genomes:
silver_carp.genome.fasta
bighead_carp.genome.fasta
grass_carp.genome.fasta
black_carp.genome.fasta
proteins:
silver_carp.pep.fasta
bighead_carp.pep.fasta
grass_carp.pep.fasta
black_carp.pep.fasta
gffs:
silver_carp.gff
bighead_carp.gff
grass_carp.gff
black_carp.gff
Methods
The four major Asian domestic carps, Mylopharyngodon piceus, Ctenopharyngodon idella, Hypophtalmichthys molitrix, and Aristichthys nobilis, were obtained from the Hunan Fisheries Science Institute (Changsha City, Hunan Province, China) and fresh tissues were stored in liquid nitrogen for high-quality sample preparation and sequencing. HiFi and Hi-C sequencing were performed on genomic DNA extracted from fish muscle and liver, respectively.
To obtain high-quality reads for subsequent analyses, Illumina raw reads were filtered to remove unknown and low-quality bases, as well as adaptor and primer sequences, using Trimmomatic software (V0.39) with specific parameters, including LEADING:3, TRAILING:3, SLIDINGWINDOW:4:15, and MINLEN:50. HiFi raw reads were processed using SMRTLink to generate subread bam files, with the following criteria: --chunk 12/16 --min-passes 3 --min-snr 2.5 --min-length 50 --max-length 50000 --min-rq 0.99. The resulting bam files were converted to fasta files using the bam2fasta program. The HiFi reads obtained had an average length ranging from 15 to 17 Kbp and a maximum length ranging from 47 to 50 Kbp.
Genome size estimation
All clean Illumina reads were used for k-mer analysis to estimate genome size using GCE (V1.0.2) software after filtering. The k-mer length was 17, and kmerfreq calculated the 17-mer depth frequency distribution, and gce calculated the genome size, repeat content, homozygosity, and heterozygosity using the following formula: genome size=Total kmer number / kmer coverage depth. For black carp, grass carp, silver carp, and bighead carp, the predicted genome sizes were 833, 840, 804, and 846 Mb, respectively.
Genome de-novo assembly
The contigs from HiFi reads were assembled using hifiasm software (V0.15.4-r343) to create a primary contig assembly in the GFA format, which included contigs of the two haplotypes. The GFA format files were then converted into fasta files using a custom Python script. Due to the advantages of HiFi reads, such as their length (10–20 kb) and high accuracy (>99.9%), the contig assembly did not require polishing. The primary contig assembly was then used to create a chromosome-level assembly of the genome, which was anchored using Juicer (V1.9.9) and 3D-DNA (V180114) software. Manual adjustment was performed on assembled genomes with discrepant chromatin interaction patterns caused by placement and orientation issues using juicebox (V1.9.8) tools to improve assembly quality.