Skip to main content
Dryad

Annotation of the novel humpback whale (Megaptera novaeangliae) reference genome

Data files

Abstract

We present a comprehensive annotation of the humpback whale (Megaptera novaeangliae) genome, representing the most complete and biologically coherent gene-model resource available to date for this species. Leveraging R (v2025.05.1+513) within a fully reproducible analysis pipeline, we evaluated 1,308,486 genomic features—including 21,833 protein-coding genes and 386,765 annotated exons. The structural features of our annotation are consistent with mammalian expectations (median gene length ≈ 4.3 kb; intron lengths peaking in the 1–10 kb range) and show coding‐sequence integrity (99.7 % of transcripts are in-frame). Gene loci are distributed non‐uniformly across scaffolds, with densities ranging from ~7 to ~15 genes per megabase, covering ~25–30 % of the assembled genome sequence. We further identified ~2,000 multi‐copy gene families and numerous tandem duplication clusters (2–12 members). These results demonstrate that the M. novaeangliae genome is both structurally complete and functionally rich, providing a robust foundation for future comparative, physiological, and conservation‐genomics investigations.