Date for: The chromosome-scale genome assembly of the yellowtail clownfish Amphiprion clarkii provides insights into melanic pigmentation of anemonefish
Data files
Feb 13, 2023 version files 993.44 MB
Abstract
Anemonefish are an emerging group of model organisms due to interesting biological traits such as sequential hermaphroditism, social control of size, symbiosis with anemones, and varying pigmentation patterns. In addition to genus-specific traits, the anemonefish Amphiprion clarkii possesses species-specific characteristics such as interspecies co-habitation, high intraspecies color variation, no anemone specificity, and a broad distribution, that have the potential to further our understanding of anemonefish evolutionary history, behavioral strategies, fish-anemone symbiosis, and color pattern evolution. However, despite its position as an emerging model species, the genome of A. clarkii is yet to be published. Here, using PacBio long-read, Illumina short-read and Hi-C chromatin capture technology we generated a high-quality chromosome-scale genome for the anemonefish A. clarkii. The initial assembly consisted of 1840 contigs with an N50 of 1,203,211 bp. These contigs were successfully anchored into 24 chromosomes of 843,582,782 bp and then annotated with 25,050 protein-coding gene models. With the chromosome-scale assembly encompassing 98.7% of conserved actinopterygian genes and the annotation containing 97.0%, the quality and completeness of this A. clarkii genome is the highest amongst all published anemonefish genomes. The publication of this high-quality genome, along with A. clarkii’s many unique traits, position this species as an ideal model organism for addressing scientific questions across a range of disciplines.
Methods
Using PacBio long-read sequencing, we generated a de novo assembly consisting of 1840 contigs with an N50 of 1,203,211 bp. Using Hi-C chromatin conformation capture techniques this initial assembly of 845,361,362 bp was anchored into 24 chromosomes of 843,582,782 bp, with a final N50 of 36,694,648 bp. This chromosome-level assembly was then polished with genomic Illumina short-reads. We then annotated 25,050 protein-coding gene models using transcriptomic data from 13 tissues, UniProt databases, and selected fish proteomes from NCBI.