The chromosome-scale genome assembly of the reef-building coral Platygyra daedalea
Data files
Apr 10, 2026 version files 426.15 MB
-
gulf.temasking.gff
271.75 MB
-
gulf.tsebra.gff3
93.71 MB
-
gulf.tsebra.pep.longest.fa
13.23 MB
-
gulf.tsebra.transcripts.fa
47.42 MB
-
MT_pdae.fasta
16.72 KB
-
MT_pdae.gb
29.63 KB
-
README.md
2.74 KB
Abstract
The genus Platygyra represents an essential group of reef-building corals in marine ecosystems. Here, we present the first high-quality genome assembly of the brain coral Platygyra daedalea, a widespread species known for its ecological resilience across diverse marine environments. Using long-read sequencing combined with chromosomal-level scaffolding based on a genetic linkage map, we assembled an 878.6 Mb genome with a contig N50 of 34.8 Mb. A total of 99.78% of the assembly was anchored into 16 linkage groups, with only eight contigs remaining unplaced. Annotation identified 27,516 protein-coding genes. This reference genome provides a critical resource for understanding coral biology, aiding conservation efforts, and exploring adaptive potential under climate change.
Overview
This dataset contains genomic resources associated with the chromosome-scale genome assembly and annotation of the reef-building coral Platygyra daedalea from the Arabian Gulf. The files include the mitochondrial genome assembly, nuclear genome annotations, transcript and protein sequences, and repeat masking annotations.
⸻
File Descriptions
Mitochondrial Genome
• MT_pdae.gb
Mitochondrial genome assembly in GenBank format, including gene annotations.
• MT_pdae.fasta
Mitochondrial genome sequence in FASTA format.
⸻
Gene Annotation and Sequences
• gulf.tsebra.gff3
Genome annotation file in GFF3 format containing gene models generated using TSEBRA.
• gulf.tsebra.transcripts.fa
FASTA file containing all predicted transcript sequences from the gene models.
• gulf.tsebra.pep.longest.fa
FASTA file containing the longest peptide sequence per gene, used for downstream analyses (e.g., functional annotation and completeness assessment).
⸻
Repeat Annotation
• gulf.temasking.gff
GFF file containing annotated repetitive elements identified across the genome, including transposable elements and other repeats.
⸻
Methods Summary
The genome assembly was generated using PacBio HiFi sequencing data and scaffolded using a genetic linkage map. Gene models were predicted using TSEBRA, which integrates evidence from multiple gene prediction approaches. The longest isoform per gene was extracted for downstream analyses. Repetitive elements were identified and annotated using a combination of de novo and homology-based approaches, and the genome was masked accordingly.
The mitochondrial genome was assembled using MitoHiFi (v3.2.3) with a reference-guided approach based on a previously published P. daedalea mitochondrial genome (GenBank accession: PP856480.1). The resulting assembly is circular and includes standard mitochondrial gene content.
⸻
Data Notes
• Gene annotations include multiple transcript isoforms; users may choose to work with either the full transcript set or the longest isoform per gene depending on their application.
• The repeat annotation includes both classified and unclassified transposable elements.
• Coordinates in all GFF/GFF3 files correspond to the nuclear genome assembly described in the associated manuscript.
⸻
Contact
For questions regarding the dataset or associated analyses, please contact:
Craig Michell
King Abdullah University of Science and Technology (KAUST)
⸻
Associated Publication
Details of the genome assembly, annotation, and validation are described in the accompanying manuscript submitted for peer review.
