Ancestral origin and structural characteristics of non-syntenic homologous chromosomes in abalones (Haliotis)
Abstract
Structural variation is increasingly recognized as a pivotal contributor to genomic diversity in marine invertebrates, yet its extent and evolutionary significance remain poorly characterized in many species. Haplotype-phased genome assembly is an excellent resource for studying such variations by comparing homologous chromosomes. Here, we focus on abalones (genus Haliotis) that are iconic marine invertebrates originating in the Cretaceous period. They have long drawn attention for their ecological roles, distinctive morphology, and cultural and economic value. In this study, we constructed a haplotype-phased genome assembly for the western Pacific abalone, Haliotis gigantea, using high-fidelity (HiFi) long-read sequencing and high-resolution chromosome conformation capture (Hi-C) data. The primary and alternative assemblies each comprised 18 long scaffolds (>50 Mb), consistent with the species’ diploid chromosome number (2n = 36), and contained 96.5% and 96.2% complete single-copy Metazoa Benchmarking Universal Single-Copy Orthologs genes, respectively, indicating high assembly quality. Comparative analysis of the two haplotypes revealed three homologous chromosomes with large-scale non-syntenic regions caused by extensive segmental duplications, with each enriched in distinct gene domains that may be related to adaptive evolution. These non-syntenic chromosomes likely originated in abalone evolution, as they were conserved across both closely and distantly related species, and led to the accumulation of duplicated genes in abalones. Our genome assembly highlights the evolutionary importance of non-syntenic structural variation in shaping genome architecture and suggests that such variation may play a broader role in functional diversification, adaptation, and consequent prosperity across abalones.
Dataset DOI: 10.5061/dryad.547d7wmmq
Description of the data and file structure
This includes fasta files and script files used for genome assembly of Haliotis gigantea and the downstream analyses based on the assembly.
data.zip
megai_hifi_only.bp.p_utg.fa.gz
fasta file of unitig outputted by hifiasm
megai_hifi_only.bp.p_utg_HiC.fasta.gz
fasta file of haplotype-phased genome assembly outputted by 3D-DNA (all scaffolds)
scripts.zip (See Zenodo link in related works)
HiC_scaffolding_main.sh
A script used for haplotype-phased genome assembly
repeat2braker2_main.sh
A script used for repeat masking and gene annotation
post_analyses_main.sh
A script used for downstream analyses based on this assembly
BUSCO_synteny
A folder including subscripts for constructing a synteny plot between primary and alternative assemblies based on BUSCO.
interproscan
A folder including subscripts and materials for interprocan analyses.
MCscanX
A folder including subscripts for MCscanX analyses.
NGenomeSyn
A folder including subscripts for NGenomeSyn analyses.
nucmer
A folder including subscripts for nucmer output.
