Whole genome assembly and annotation of the King Angelfish (Holacanthus passer) gives insight into the evolution of marine fishes of the Tropical Eastern Pacific
Data files
Oct 24, 2023 version files 72.04 MB
-
gemoma_HPA_1.1.log
-
gemoma.job
-
HPA_1.1_annotation.gff
-
HPA_1.1_predicted_proteins.fasta
-
protocol_GeMoMaPipeline.txt
-
README.md
Nov 08, 2023 version files 805.40 MB
-
gemoma_HPA_1.1.log
-
gemoma.job
-
HPA_1.1_annotation.gff
-
HPA_1.1_blobtools.zip
-
HPA_1.1_predicted_proteins.fasta
-
HPA_1.1.fasta.cat.gz
-
HPA_1.1.fasta.masked
-
HPA_1.1.fasta.out
-
HPA_1.1.fasta.tbl
-
missing_busco_list_HPA_1.1_busco_actinopterygii.tsv
-
missing_busco_list_HPA_1.1_busco_eukaryota.tsv
-
protocol_GeMoMaPipeline.txt
-
README.md
-
repeatmasker.job
-
repeatmasker.log
-
short_summary_HPA_1.1_busco_actinopterygii.txt
-
short_summary_HPA_1.1_busco_eukaryota.txt
Nov 13, 2023 version files 1.46 GB
-
full_table_HPA_1.1_busco_actinopterygii.txt
-
full_table_HPA_1.1_busco_eukaryota.txt
-
gemoma_HPA_1.1.log
-
gemoma.job
-
HPA_1.1_annotation.gff
-
HPA_1.1_blobtools.zip
-
HPA_1.1_nucleotide_proteins.fasta
-
HPA_1.1_predicted_proteins.fasta
-
HPA_1.1.fasta
-
HPA_1.1.fasta.cat.gz
-
HPA_1.1.fasta.masked
-
HPA_1.1.fasta.out
-
HPA_1.1.fasta.tbl
-
missing_busco_list_HPA_1.1_busco_actinopterygii.tsv
-
missing_busco_list_HPA_1.1_busco_eukaryota.tsv
-
protocol_GeMoMaPipeline.txt
-
README.md
-
repeatmasker.job
-
repeatmasker.log
-
short_summary_HPA_1.1_busco_actinopterygii.txt
-
short_summary_HPA_1.1_busco_eukaryota.txt
Abstract
Holacanthus angelfishes are some of the most iconic marine fishes of the Tropical Eastern Pacific (TEP). However, very limited genomic resources currently exist for the genus. In this study we: i) assembled and annotated the nuclear genome of the King Angelfish (Holacanthus passer), and ii) examined the demographic history of H. passer in the TEP. We generated 43.8 Gb of ONT and 97.3 Gb Illumina reads representing 75X and 167X coverage, respectively. The final genome assembly size was 583 Mb with a contig N50 of 5.7 Mb, which captured 97.5% complete Actinoterygii Benchmarking Universal Single-Copy Orthologs (BUSCOs). Repetitive elements account for 5.09% of the genome, and 33,889 protein-coding genes were predicted, of which 22,984 have been functionally annotated. Our demographic model suggests that population expansions of H. passer occurred prior to the last glacial maximum (LGM) and were more likely shaped by events associated with the closure of the Isthmus of Panama. This result is surprising, given that most rapid population expansions in both freshwater and marine organisms have been reported to occur globally after the LGM. Overall, this annotated genome assembly will serve as a resource to improve our understanding of the evolution of Holacanthus angelfishes while facilitating novel research into local adaptation, speciation, and introgression in marine fishes.
README: Whole genome assembly and annotation of the King Angelfish (Holacanthus passer) gives insight into the evolution of marine fishes of the Tropical Eastern Pacific
by Remy Gatins, Carlos F. Arias, Carlos Sánchez, Giacomo Bernardi, and Luis F. De León
corresponding author: remygatinsa@gmail.com
Genome Assembly Files
- HPA_1.1.fasta - fasta genome assembly file
Genome Annotation Files
- HPA_1.1_annotation.gff - gff genome annotation file
- HPA_1.1_nucleotide_proteins.fasta - coding gene nucleotide sequences in fasta file format
- HPA_1.1_predicted_proteins.fasta - predicted proteins in fasta file format
- gemoma.job - sbatch job submission to run Gemoma pipeline
- gemoma_HPA_1.1.log - output log file from gemoma.job submission
- protocol_GeMoMaPipeline.txt - Gemoma pipeline parameter description
RepeatMasker files
- HPA_04_pilonb.fasta.cat.gz - alignment file
- HPA_04_pilonb.fasta.masked - fasta file showing repeats across the genome
- HPA_04_pilonb.fasta.out - annotation file with the cross_match output lines
- HPA_04_pilonb.fasta.tbl - repeatmasker summary file
- repeatmasker.log - HPA_repeatmasker log file
- repeatmasker.job - repeatmasker job sumission
BUSCO output files
- missing_busco_list_HPA_1.1_busco_actinopterygii.txt - missing BUSCO list using Actinopterygii_odb9 database
- missing_busco_list_HPA_1.1_busco_eukaryota.txt - missing BUSCO list using Eukaryota_odb9 database
- short_summary_HPA_1.1_busco_actinopterygii.txt - BUSCO short summary using Actinopterygii_odb9 database
- short_summary_HPA_1.1_busco_eukaryota.txt - BUSCO short summary using Eukaryota_odb9 database
- full_table_HPA_1.1_busco_actinopterygii.tsv - BUSCO full output table using Actinopterygii_odb9 database
- full_table_HPA_1.1_busco_eukaryota.tsv - BUSCO full output table using Eukaryota_odb9 database
Extra files
- HPA_1.1_blobtools.zip - this zipped folder includes about 40
.json
files that are used all together as input to generate the blob tools viewer.
Methods
To annotate our genome, we used the homology-based gene prediction pipeline GeMoMa (v1.6.4). GeMoMa uses protein-coding gene models and intron position conservation from reference genomes to predict possible protein-coding genes in a target genome (Keilwagen et al., 2018). Here, we ran the GeMoMa pipeline using annotations from three fish species: Amphiprion ocellaris, Oreocromis niloticus, Electrophorus electricus (downloaded from NCBI, see Table S3). These species were selected to represent a variety of genes from close to distant high-quality fish annotations. In our particular case, the pipeline performed four main steps: 1) Extractor or external search, using the search algorithm tbalstn with cds parts as queries from our reference genomes, 2) Gene Model Mapper (GeMoMa), which builds gene models from the extractor results, 3) GeMoMa Annotation Filter (GAF) that filters and combines common gene predictions and 4) AnnotationFinalizer, which predicts UTRs for annotated coding sequences and generate genes and transcripts names (Keilwagen et al., 2018). Additionally, repetitive elements were predicted by running RepeatMasker (open-4.0.6, Smit et al. 2013–2015) with the Teleostei database to identify repetitive elements in the genome and soft-mask the assembly. RepeatMasker.out was converted to GFF with RepeatMasker script `rmOutToGFF3.pl`.