Chromosome-level genome assembly and annotation of Pterygoplichthys pardalis
Data files
Nov 16, 2022 version files 1.64 GB
-
Pterygoplichthys_pardalis_genome.fasta
1.53 GB
-
Pterygoplichthys_pardalis.cds
41.17 MB
-
Pterygoplichthys_pardalis.gff3
53.21 MB
-
Pterygoplichthys_pardalis.pep
14.71 MB
-
README.md
683 B
May 14, 2025 version files 1.89 GB
-
Pterygoplichthys_pardalis_denovo.gff3
52.12 MB
-
Pterygoplichthys_pardalis_genome.fasta
1.53 GB
-
Pterygoplichthys_pardalis_homology.gff3
194.14 MB
-
Pterygoplichthys_pardalis_transcripts.gff3
9.86 MB
-
Pterygoplichthys_pardalis.cds
41.17 MB
-
Pterygoplichthys_pardalis.gff3
53.21 MB
-
Pterygoplichthys_pardalis.pep
14.71 MB
-
README.md
3.85 KB
Abstract
Suckermouth catfishes, with their evolved powerful features, have become notorious invasive species, causing significant damage to aquatic ecosystems. However, the lack of high-quality genomes severely restricts research on this group within the field. In this study, we de novo assembled the chromosome-level genome assembly of Pterygoplichthys pardalis using multiple platforms of sequencing data, including Illumina short reads, Nanopore long reads, and Hi-C sequencing reads, resulting in a 1.51 Gb genome assembly. Multiple evaluations, including read mapping ratio (98.52%), transcript mapping ratio (99.61%), conserved BUSCO gene set (98.8%), and N50 score (49.47 Mb), indicated the high continuity and accuracy of the genome assembly we generated. Genome annotation found that 0.97 Gb of genome sequences are repetitive sequences, accounting for 64.47% of the genome assembly. Further, 23,859 protein-coding genes were successfully predicted, 92.92% of which could be annotated in functional databases. This high-quality genome assembly of P. pardalis provides a valuable resource for understanding the genetic underpinnings of P. pardalis's invasive success and offers critical data for future fisheries research and management.
Dataset DOI: 10.5061/dryad.bk3j9kdgh
Description of the data and file structure
Dataset DOI: 10.5061/dryad.bk3j9kdgh
The catfish samples used in this study were purchased from an ornamental fish wholesale market in Xi’an, China. The remaining samples of this specimen (Catfish_01) have been cryopreserved at -80°C in the Biodiversity Repository of the Institute of Basic and Translational Medicine at Xi’an Medical University. All animal specimens were collected legally in accordance with the policy of the Animal Care and Use Ethics of the institution. Genomic DNA was extracted from the muscle tissue of one suckermouth catfish (P. pardalis) using the Blood & Cell Culture DNA Mini Kit (Qiagen).
Files and variables
File: Pterygoplichthys\_pardalis.cds
Description: Gene coding sequence
File: Pterygoplichthys\_pardalis.pep
Description: Gene Protein Sequence
File: Pterygoplichthys\_pardalis.gff3
Description: Final gene annotation results based on three methods
File: Pterygoplichthys\_pardalis\_genome.fasta
Description: Genome assembly file
We add these files in the new version:
File: Pterygoplichthys\_pardalis\_denovo.gff3
Description: The de novo-based prediction of protein-coding genes
File: Pterygoplichthys\_pardalis\_transcripts.gff3
Description: The transcript-based prediction of protein-coding genes
File: Pterygoplichthys\_pardalis\_homology.gff3
Description: The homology-based annotation of protein-coding genes
Code/software
Genome assembly
The genome assembly was performed with the following steps: 1) Long reads from the Nanopore platform were used for the contig-level assembly using NextDenovo (v2.2). Key parameters were carefully set to ensure optimal assembly, including a read cutoff of 1k, a seed cutoff of 59754, and a blocksize of 5g. 2) Cleaned short reads generated from the Illumina short-insert library were mapped onto the assembled contigs using BWA (v0.7.17) . To further enhance the accuracy of the assembly at the single-base level, we performed two iterations of correction using Pilon (v1.22) . 3) We mapped the Hi-C sequencing reads to the corrected contigs, and subsequently utilized Juicer (v1.5.7) and 3D de novo assembly (v180922) to perform chromosome-level genome assembly.
Genome annotation
Tandem repetitive sequences within the genome were identified using Tandem Repeat Finder (v4.07) .Non-interspersed repeats in the genome were annotated using RepeatMasker (v4.1.0).Transposable elements (TEs) in the genome were annotated at both the DNA and protein levels. A *de novo *repeat library at the DNA level was constructed using RepeatModeler (v1.0.4) enabling the identification of potential novel repetitive sequences. The genome assembly was searched against Repbase (v23.06 ) using RepeatMasker to detect homologous repetitive sequences, providing a more comprehensive picture of the repetitive sequence content. RM-BLASTX within RepeatProteinMask (v4.1.0) was employed to query the TE protein database at the protein level.
Access information
Other publicly accessible locations of the data:
- Not applicable
Data was derived from the following sources:
- Not applicable
Version changes
29-April-2025:
- Added Pterygoplichthys_pardalis_denovo.gff3 file which contains the de novo-based prediction of protein-coding genes;
- Added Pterygoplichthys_pardalis_transcripts.gff3 file which contains the transcript-based prediction of protein-coding genes;
- Added Pterygoplichthys_pardalis_homology.gff3 file which contains the homology-based annotation of protein-coding genes.