Data and analysis scripts to accompany: Coral genetic structure in the Western Indian Ocean mirrors ocean circulation and thermal stress history
Data files
Feb 19, 2026 version files 544.65 MB
-
allDB_area.rda
624.74 KB
-
BLAST_REF_Acropora.rda
4.67 MB
-
BLAST_REF_Pocillopora.rda
2.73 MB
-
Environmental_variable_description.csv
732 B
-
GENES_Acropora.rda
3.16 MB
-
GENES_Pocillopora.rda
2.36 MB
-
genomic_Acropora.gff
254.39 MB
-
genomic_Pocillopora.gff
151.41 MB
-
MF_gsc_Acropora.rda
465.58 KB
-
MF_gsc_Pocillopora.rda
552.75 KB
-
README.md
22.08 KB
-
Sample_ID_location_Acropora.csv
43.95 KB
-
Sample_ID_location_Pocillopora.csv
51.79 KB
-
SeaCurrents_TM.zip
53.52 MB
-
Site_coordinates_environment.csv
2.15 KB
-
SNPS_FILT_Acropora.rda
3.73 MB
-
SNPS_FILT_Pocillopora.rda
3.66 MB
-
SNPS_RAW_Acropora.rda
33.81 MB
-
SNPS_RAW_Pocillopora.rda
23.13 MB
-
WIO_reefs_coord_samplesites.csv
411 B
-
WIO_reefs_coord.rda
22.46 KB
-
WIO_reefs_ENV.rda
47.48 KB
-
WIOcoral_seascapegenomics_Acropora.pdf
3.04 MB
-
WIOcoral_seascapegenomics_Pocillopora.pdf
3.13 MB
-
WIOcoral_seascapegenomics_script.Rmd
78.25 KB
Abstract
With ever-growing concerns over the conditions of coral reefs under warming oceans comes a need to better understand the connectivity and adaptive capacities of reef-building corals from across large oceanic regional extents.
This dataset accompanies the published article by Guillaume et al (2026) in Evolutionary Applications, where we applied a seascape genomics approach to model (i) population connectivity and (ii) thermal adaptive potentials for two keystone coral species across the West Indian Ocean (WIO). Specifically, we sampled 345 Acropora muricata and 403 Pocillopora damicornis individuals around islands of three WIO regions–Seychelles, Mauritius, and Rodrigues–as part of a UN development programme (UNDP; funding information below). Genomic data were obtained from DNA extracted from sampled coral tissue and sequenced with DArT-seq. We used a bioinformatic pipeline to process and filter reads, which were then used to assess population structure and connectivity, identify putative genomic regions under thermal selection, and produce maps of adaptive potential across the WIO.
This repository hosts one pdf per species that steps the user through all the main analyses of the publication. These pdfs have been produced using the Rmarkdown script provided, which will run when all the accompanying data (sample information, environmental variables, genomic data) are downloaded into one folder.
Description of the data and file structure
The following data are provided to replicate analyses and main figures in the article ‘Coral genetic structure in the Western Indian Ocean mirrors ocean circulation and thermal stress history’ (citations below). These data and Rmarkdown script will reproduce the pdf tutorial files for both species (WIOcoral_seascapegenomics_species.pdf)
Briefly, the population connectivity and adaptive potentials of two reef building coral species (Acropora muricata and Pocillopora damicornis) were assessed by applying seascape genomic approaches. We sampled 345 A. muricata and 403 P. damicornis around islands of three West Indian Ocean (WIO) regions: Seychelles, Mauritius, and Rodrigues.
Download all files into one folder with the Rmarkdown to run the script. Simply change the SP variable in the first chunk to either Acropora or Pocillopora to perform analyses of one or the other species
Files and variables
File: WIOcoral_seascapegenomics_Acropora.pdf & File: WIOcoral_seascapegenomics_Pocillopora.pdf
Description: The resulting pdf output when the Rmarkdown script is knitted in R. These pdfs act as tutorials to follow the primary analyses of the manuscript, outlining the key methods and analyses undertaken in the manuscript, and replicating the results figures and tables. One pdf is provided for each species, which can be replicated using the WIOcoral_seascapegenomics_script.Rmd script (information below).
File: WIOcoral_seascapegenomics_script.Rmd
Description: Rmarkdown script to follow the primary analyses of the manuscript, outlining the key methods and analyses undertaken in the manuscript, and replicating the results figures and tables. Simply change the 'SP' variable in the first chunk to modify the species modelled. Variables can be either 'Acropora' or 'Pocillopora'. The script can then be knitted together in R to reproduce the pdfs (above), or individual chunks can be run for each analysis. Basic interpretations of analyses are included; see the published manuscript (link at bottom) for detailed information.
Sample information
File: Sample_ID_location_Acropora.csv & File: Sample_ID_location_Pocillopora.csv
Description: Location details for each genotype sample of Acropora and Pocillopora
Variables
- SampleID: genotype name, corresponds to genomic rda files
- SS: sample site identifier, starting with ‘S’
- Reef: unique reef name
- SubRegion: One of five subregions: PRA = Praslin, MAH = Mahé, IPL = Ile Plate, MW = West Mauritius, ME = East Mauritius, ROD = Rodrigues
- Region: One of three regions: MAU = Mauritius, ROD = Rodrigues, SEY = Seychelles
- LAT: decimal latitude
- LON: decimal longitude
- depth: range of depths between which coral samples were taken
- BioProject_PRJNA1277000_Accession: Genotype’s Sample ID in the BioProject PRJNA1277000
- FASTQ.ID: Genotype’s Sample ID in the BioProject PRJNA1277000
- Final_dataset: binary YES/NO retained for the final analyses
Environment
We provide values of 10 uncorrelated environmental and geomorphic variables at each of the 15 reefs used to characterise their environments, where sSST (sd of sea surface temperature), mDHW (mean degree heating week) and BackReef (proportion of backreef) were used as predictors in genotype–environment association (GEA) analyses.
File: Environmental_variable_description.csv
Description: Environmental variable names used to characterise the seascape of 15 reefs sampled for this research, where this file is not needed to run the script. Variable names are provided with the acronyms used in analyses, alongside informaiton on the original source of the data, spatial and temporal resolutions. Please see the main manuscript of Guillaume et al (2026) Evolutionary Applicaitons for more specific details regarding these variables.
Variables
- Variable: Name of the variable
- Acronym: Acronym of the variable used in the script and analyses
- Source: Open access source of the enviornmental variable. Either RECIFS or ACA. See 'Access information' below for more information on these two open-access data sources
- Spatial resolution: spatial extent of raster cells for each variable. Either 5x5 km or 10x 10m
- Temporal resolution: temporal extent used to calculate variable values for each variable. Value is 'na' when only one time point is used, otherwise the years of measurement are listed.
File: Site_coordinates_environment.csv
Description: Information regarding the 15 sample sites (reefs) where coral colonies were collected. This file contains site coordinates and environmental variable values extracted from raster files for the 15 environmental variables (see Environmental_variable_description.csv above for more information regarding data sources).
Variables
- SS: Unique site identifier, staring with ‘S’
- Reef: Unique reef name
- Region: One of three regions: MAU = Mauritius, ROD = Rodrigues, SEY = Seychelles
- SubRegion: One of five subregions: PRA = Praslin, MAH = Mahé, IPL = Ile Plate, MW = West Mauritius, ME = East Mauritius, ROD = Rodrigues
- LAT: decimal latitude
- LON: decimal longitude
- depth: range of depths between which coral samples were taken
- ReefSlope: Proportion of reef slope around sample site
- ReefFlat: Proportion of reef flat around sample site
- BackReef: Proportion of back reef around sample site
- mDHW: mean degree heating week (°C-week)
- sSST: sd of the sea surface temperature (°C)
- CRO: Proportion of cropland around sample site (cropland proportion in 5km2)
- LAN: Proportion of land around sample site (land proportion in 5km2)
- HPO: Human population density around sample site (human density per 5km2)
- VBD: Boat density around sample site (boat counts per 5km2)
- DEP: Depth of sample site, derived from bathymetry maps (meters below sea level)
File: SeaCurrents_TM.zip
Description: We provide the transition matrices (a square array of numbers representing probabilities of moving from one reef to another) of sea distances between reefs in our study, derived from sea current data, following methods of Selmoni et al. (2020a). Briefly, we retrieved maps describing monthly average direction and strength of surface sea currents throughout the WIO. These maps are available at a spatial resolution of 0.083° across 30 years as satellite-derived reconstructions of sea currents (from the ‘GLOBAL_REANALYSIS_PHY_001_030_104’ dataset, accessed on 03-04-2014; E.U. Copernicus Marine Service Information (CMEMS), 2024), which are publicly available via RECIFS (Selmoni et al., 2023; citation below). For each pixel, we calculated the cumulative speed towards each of the eight neighbouring pixels and divided this by the total speed to obtain a probability of transition in each direction (the conductance). We then calculated the dispersal costs as the inverse of the square conductance to obtain transition matrices using the gdistance R package (v1.6.4, van Etten, 2017).
Upon unzipping: files are rda: transition_matrix_01.rda , where one transition matrix is provided per month (denoted with '01'-'12') alongside an annual average ('13').
Genomic files
We provide genomic files for each species, where SNPs (single nucleotide polymorphisms) were genotyped from individual coral colonies collected at the sample sites (reefs). Information on all the samples collected and sequenced in this study are available on GEOME at https://n2t.net/ark:/21547/R2651. Raw sequences can be downloaded from NCBI (BioProject PRJNA1277000).
Some files are provided as rda objects, which can be read into R using the load() base R function (eg: load(file= SNPS_RAW_Acropora.rda)).
‘SNPS_RAW’ and ‘SNPS_FILT’ are genlight objects from the adegenet R package (Jombart et al 2008), which are a specialised formal S4 class designed to store and manage genotype data, particularly for binary SNPs in a memory-efficient manner. They store data about the loci associated with each genotyped individual. These objects can be manipulated using the dartR R package (Mijangos et al., 2022).
File: SNPS_RAW_Acropora.rda & File: SNPS_RAW_Pocillopora.rda
Description: RDA object called asSNPS_RAW. This is a DARTR object that contains the raw SNPs from the DArT-seq analytical pipeline, resulting in 73,253 bi-allelic SNPs genotyped for 345 A. muricata individuals and 65,708 SNPs for 403 P. damicornis individuals. This file is provided for completeness of the repository and interest, but is not required in the Rscript. Only the filtered DART-seq rda file is needed (see following file entry).
File: SNPS_FILT_Acropora.rda & File: SNPS_FILT_Pocillopora.rda
Description: RDA object called as SNPS_FILT. The SNPs retained after stringent filtering of the SNPS_RAW_species.rda to produce SNPS_FILT_species.rda, which includes filtering for quality control, and removing clones and putatively cryptic individuals (refer to the main manuscript for detailed methods of filtering steps). This is a DARTR object that contains the final genotype matrix comprised of 211 A. muricata individuals with 7,663 SNPs and 97 P. damicornis individuals with 13,190 SNPs.
File: BLAST_REF_Acropora.rda & File: BLAST_REF_Pocillopora.rda
Description: RDA object called as BLAST_REF. The BLAST_REF dataframe files were created to align raw DArT-seq loci to each species’ reference genomes using BLAST (Basic Local Alignment Search Tool) to retain only the SNPs associated with the coral hosts (filtering thresholds: >70% percentage identity, >80% overlap identity, and >50 bitscore). The reference genome for A. muricata was the chromosome-level assembly of Acropora millepora (v2.1; GCF_013753865.1; Fuller et al., 2020), and the reference genome for P. damicornis was the scaffold-level assembly of P. damicornis (v1; GCF_003704095.1; Cunning et al., 2018). See the BLAST help page for full details on variable descriptions.
File: GENES_Acropora.rda & File: GENES_Pocillopora.rda
Description: RDA object called as GENES. RDA object that stores information regarding genes alongside their chromosome location, created from a BLAST against the reference genomes (NCBI downloads GCF_013753865.1 for Acropora muricata and GCF_003704095.1 for Pocillopora damicornis
Variables
- Gene_name: Name of the gene/locus identifier (e.g. LOC IDs)
- V1: Reference sequence accession for the genomic region (e.g. chromosome or scaffold ID)
- V2: Annotation source; indicates the gene prediction pipeline used for annotation.
- V3: Feature type = “gene”
- V4: Start genomic coordinate (integer); leftmost position of the gene on the reference sequence.
- V5: End genomic coordinate (integer); rightmost position of the gene on the reference sequence.
- V6: Score field from GFF format. A dot (“.”) indicates no score was provided.
- V7: Strand orientation. “+” for forward strand and “−” for reverse strand.
- V8: Phase/frame field from GFF format. (For gene features this is typically “.”)
- V9: Attribute field (GFF-style metadata). Contains semicolon-separated values of: gene ID, cross-references (e.g. GeneID), gene name, gene biotype (e.g. protein_coding, lncRNA)
- UP_ID: UniProt accession ID corresponding to the gene product (if available). NA = no mapped UniProt entry.
- UP_Protein_name: Full protein name from UniProt, including alternative names and EC numbers where applicable.
- GO_bp: Gene Ontology (GO) Biological Process annotations associated with the protein, including GO term names and IDs.
- GO_mf: Gene Ontology Molecular Function annotations associated with the protein.
- GO_cc: Gene Ontology Cellular Component annotations indicating subcellular localisation.
File: MF_gsc_Acropora.rda & File: MF_gsc_Pocillopora.rda
Description: RDA object called as MF_gsc. Stores gene set collections (gsc) pertaining to molecular functions (MF) used in Gene Ontology (GO) enrichment analyses.
Variables
- maxSetSize: The maximum allowed number of genes for a GO term set to be included in the analysis.
- referenceSet: A character vector containing all gene identifiers used as the background or universe for the analysis
- sets: A list of GO term–specific gene sets. Each element corresponds to one Molecular Function (MF) GO term and contains the genes annotated to that term.
- g: The total number of genes in the reference set (i.e. the size of the gene universe).
- bigSets: A named logical vector indicating whether each GO term exceeds the maximum set size threshold.
TRUEdenotes sets considered too large;FALSEdenotes retained sets. - intersection.p.cutoff: The significance threshold used to retain pairwise intersections between GO term sets (p-value cut-off).
- intersections: A data table containing statistically significant pairwise overlaps between GO term sets, including the two term IDs (
setA,setB) and their associated p-value. - iMatrix: A binary incidence matrix (genes × sets) indicating gene membership in GO term sets (1 = gene belongs to the set, 0 = not a member).
File: genomic_Acropora.gff & File: genomic_Pocillopora.gff
Description: The reference genome files downloaded from NCBI alongside fna files (see description above for reference genome information). The .gff objects are formal class 'GRanges' from the R package "GenomicRanges", with 7 slots
Variables
- seqnames: An
Rle(run-length encoded) object indicating the sequence (e.g. chromosome or scaffold) on which each genomic feature is located. Thevaluescorrespond to chromosome identifiers (e.g."NC_058066.1"), andlengthsreflect consecutive runs of identical sequence names. - ranges: An
IRangesobject defining the genomic coordinates of each feature. It contains the start position and the width (feature length), from which end positions can be derived. - strand: An
Rleobject specifying the strand orientation of each feature:"+"(forward),"-"(reverse), or"*"(unstranded/unknown). - seqinfo: A
Seqinfoobject holding metadata about the reference sequences, including sequence names, sequence lengths, circularity status, and genome build information. - elementMetadata: A
DFrame(DataFrame) containing feature-level annotations (one row per genomic range). This corresponds to the attribute columns typically present in a GFF/GTF file (e.g. source, type, gene ID, transcript ID, biotype, product, etc.). - elementType: A character string defining the type of elements stored in the object (here
"ANY"), describing the general class of the contained ranges. - metadata: A list for storing additional, object-level metadata that apply to the entire
GRangesobject rather than to individual features.
West Indian Ocean regional variables
File: allDB_area.rda
Description: RDA object called as DB_area that stores the area of all reef cells in RECIFS in the WIO. This named numeric vector contains the grid ID as the column names and the associated area in row 1.
File: WIO_reefs_coord.rda
Description: RDA object called as WIO_reefs_coordthat stores latitude and longitude coordinates for all reef raster cells across the WIO.
Variables
- Row names: Grid ID that correspond to the column names in
DB_area - lon: decimal longitude
- lat: decimal latitude
File: WIO_reefs_ENV.rda
Description: RDA object called as WIO_reefs_ENVthat stores values of environmental variables for all reef raster cells across the WIO, extracted from RECIFS using coordinates of WIO_reefs_coord.rda.
Variables
- Row names: Grid ID that correspond to the column names in
DB_area - ReefSlope: Proportion of reef slope in grid area
- ReefFlat: Proportion of reef flat in grid area
- BackReef: Proportion of back reef in grid area
- mDHW: mean degree heating week of grid area (°C-week)
- sSST: sd of the sea surface temperature of grid area (°C)
- CRO: Proportion of cropland in grid area
- LAN: Proportion of land in grid area
- HPO: Human population density in grid area
- VBD: Boat density in grid area
- DEP: Depth derived from bathymetry maps of grid area (meters below sea level)
File: WIO_reefs_coord_samplesites.csv
Description: The grid values associated with each sample site to link to the WIO_reefs_coord.rda file
Variables
- SS: sample site ID
- Grid: grid name that corresponds to the WIO_reefs_coord.rda file
Code/software
The R script provided (WIOcoral_seascapegenomics_script.Rmd ) contains code for the following analyses, where the user only needs to change the object 'SP' to either 'Acropora' or 'Pocillopora' in the first line of code to run analyses for each. The script requires SetRank v1.0.0 from https://cran.r-project.org/src/contrib/Archive/SetRank/
The R script follows this workflow:
- Map of sample sites in the WIO
- Population structure assessment using PCoA and sNMF
- Connectivity using Fst to calculate Isolation-by-Distance (IBD) and Isolation-by-Resistance (IBR)
- Selection of predictor variables
- Genotype-Environment Associations (GEA) using Redundancy Analyses (RDA) to identify putative loci under selection, including GO term analyses
- Adaptive Seascape projections across the WIO
The R script was produced using R version 4.4.1 (2024-06-14) with the following packages:
Attached base packages:
stats4, stats, graphics, grDevices, utils, datasets, methods, base
Other attached packages:
marmap_1.0.12, rnaturalearth_1.1.0, sf_1.0-24, viridis_0.6.5, viridisLite_0.4.2, ggh4x_0.3.1, ggrepel_0.9.6, ggpubr_0.6.2, gridExtra_2.3, SetRank_1.0.0, gdistance_1.6.5, Matrix_1.7-4, igraph_2.2.1, raster_3.6-32, sp_2.2-0, sphereplot_1.5.1, rgl_1.3.31, poppr_2.9.8, bigutilsr_0.3.11, vegan_2.7-2, permute_0.9-8, LEA_3.16.0, qqman_0.1.9, qvalue_2.36.0, pcadapt_4.4.1, rtracklayer_1.64.0, GenomicRanges_1.56.2, GenomeInfoDb_1.40.1, IRanges_2.38.1, S4Vectors_0.42.1, BiocGenerics_0.50.0, stringr_1.6.0, reshape2_1.4.5, tibble_3.3.1, tidyr_1.3.2, dartR_2.9.9.5, dartR.data_1.0.8, dplyr_1.1.4, ggplot2_4.0.1, adegenet_2.1.11, ade4_1.7-23
Access information
Other publicly accessible locations of the data:
- FASTQ files: Raw sequence data as FASTQ files have been deposited in the NCBI BioProject database under accession number PRJNA1277000. The corresponding metadata from GEOME are included: Metadata_DARTinfo_species.csv
Environmental data (described in Environmentalvariabledescription.docx) were obtained from the following open access sources:
- Reef Environment Centralized InFormation (RECIFS) at recifs.epfl.ch
Reference: Selmoni, O., Lecellier, G., Berteaux-Lecellier, V., Joost, S., 2023. The Reef Environment Centralized InFormation System (RECIFS): An integrated geo-environmental database for coral reef research and conservation. Glob. Ecol. Biogeogr. 32, 622–632. https://doi.org/10.1111/geb.13657 - Allen Coral Atlas (ACA) at allencoralatlas.org
Reference: Allen Coral Atlas (2022) Imagery, maps and monitoring of the world’s tropical coral reefs. https://doi.org/10.5281/zenodo.3833242 Beaman, R.J. (2010).
Referencing this work
We welcome the script and data files provided for use in similar research elsewhere, and we are open to collaborations - our contact details are below. We do ask that if this script and associated data are used in your research, please cite the following publication and datasets:
Publication: Guillaume, A.S., Joost, S., Curpen, S., Dumur Neelayy, D., Harree-Somah, L., Sadasing, O., Saponari, L., Dale, C., Barret, L., Andrews, N., Leckraz, S.K., François, R., Seetapah, V., Munusami, V., Bacha Gian, S., Jhangeer-Khan, R., Mahoune, T., Chumun, P.K., Poretti, M., Berteaux-Lecellier, V., Lecellier, G., Selmoni, O. (2026) Coral genetic structure in the Western Indian Ocean mirrors ocean circulation and thermal stress history. Evolutionary Applications DOI: 10/1111/eva.70206
Data set and scripts: Guillaume, A.S., Joost, S., Curpen, S., Dumur Neelayy, D., Harree-Somah, L., Sadasing, O., Saponari, L., Dale, C., Barret, L., Andrews, N., Leckraz, S.K., François, R., Seetapah, V., Munusami, V., Bacha Gian, S., Jhangeer-Khan, R., Mahoune, T., Chumun, P.K., Poretti, M., Berteaux-Lecellier, V., Lecellier, G., Selmoni, O. (2026). Data and analysis scripts to accompany: Coral genetic structure in the Western Indian Ocean mirrors ocean circulation and thermal stress history [Dataset]. Dryad DOI: 10.5061/dryad.931zcrjxv
Raw FASTQ sequences: All DArT-seq genotyped individuals (Acropora spp.: 515 FASTQ files for 345 individuals; Pocillopora spp.: 517 FASTQ files for 403 individuals) are available from NCBI BioProject PRJNA1277000 with associated metadata also available on GEOME
Contact us at:
Dr Annie Guillaume: annie.guillaume (at) alumni.epfl.ch ; ORCID
Dr Oliver Selmoni: oliver.selmoni (at) geo.uzh.ch ; ORCID
Find A.S. Guillaume on github
Sample design and study species:
Two keystone reef-building coral species, Acropora muricata and Pocillopora damicornis, were sampled from 15 reefs with contrasting environmental conditions around the Seychelles, Mauritius, and Rodrigues islands in the WIO in 2022. These species have distinct life history strategies, warranting independent investigations of population structure and thermal tolerance. A. muricata is predominately a broadcast spawning coral with synchronised reproduction in November to January, followed by larvae that generally settle within 10–14 days of fertilisation. P. damicornis has a mixed reproductive modes, reproducing sexually via year-round broadcast spawning or by producing brooding planula larvae with rapid settlement rates. Clonal propagation is also common, with the production of asexual larvae or polyps that can disperse over large distances (>50km). Overall, 345 A. muricata and 403 P. damicornis colonies were sampled for DNA extraction and single nucleotide polymorphism (SNP) genotyping.
All analyses were performed in the R environment, following the script attached.
Environmental variables:
We used 10 uncorrelated environmental and geomorphic variables to characterise the seascape at the study sites (details of variables in Environmental_variable_description.docx). We further use sSST, mDHW and BackReef as predictors in genotype–environment association (GEA) analyses.
Genomic data:
DNA extraction and SNP genotyping followed methods of Selmoni et al. (2021), where extracted DNA was sent to Diversity Arrays Technology (Canberra, Australia) for quality check screening and genotype-by-sequencing using the DArT-sequencing method (DArT-seq). Here, we provide the metadata for the FASTQ files uploaded to GEOME and NCBI (BioProject PRJNA1277000; Metadata_DARTinfo_species.csv). The Dar-T-seq analytical pipeline resulted in 73,253 bi-allelic SNPs genotyped for 345 A. muricata individuals and 65,708 SNPs for 403 P. damicornis individuals (SNPS_RAW_species.rda). These SNPs were called against the chromosome-level reference genome of Acropora millepora (v2.1; GCF_013753865.1; Fuller et al., 2020) for A. muricata, and the scaffold-level reference genome assembly of P. damicornis (v1; GCF_003704095.1; Cunning et al., 2018) for P. damicornis. After filtering (detailed below) our final genotype matrix comprised of 211 A. muricata individuals with 7663 SNPs and 97 P. damicornis individuals with 1319 (SNPS_FILT_species.rda).
SNP filtering: SNP filtering was performed for each species separately. We first removed putatively cryptic individuals likely present for our sampled species, where ‘cryptic’ is defined as genetically distinct groups among sets of colonies identified in-situ as the same species (Grupstra et al., 2024; Riginos et al., 2024). To this end, we performed a soft filtering for loci and individual missingness (50% threshold each), then ran a principal coordinate analysis (PCoA) on Euclidean distances of the genotype matrix (an ordination-based method that can handle missing data). We used PCoA axes to visually identify clusters of genetically distinct individuals co-occurring with most of the sampled individuals. Individuals from these clusters are putatively cryptic and were removed, as recommended by standard guidelines. We then identified and removed clonal genotypes on a hard filtered dataset (loci and individuals each pruned for 80% missingness). Clones were identified as those sharing over 95% or 90% of their genotypes for A. muricata and P. damicornis, respectively, using the gl.report.replicates function in the dartR.base package (v. 1.0.5; Mijangos et al., 2022). The choice of threshold corresponds to the separation between first-degree relatives and replicated individuals in the histogram of pairwise relatedness values per species. For each group of putative clones, we retained the one individual with the least missing SNP data when genetic similarity was found. We identified and removed another cluster of putatively cryptic individuals for P. damicornis as per the first cryptic filtering step. Finally, we performed a hard filtering on the genotype matrix pruned for cryptic and clonal individuals, applying a missingness threshold of 80% for loci and individuals, before excluding rare alleles with a MAF <5%. Global MAF thresholds were applied to retain population-specific SNP variants for downstream population structure and genotype–environment analyses.
Neutral genetic structure:
Neutral genetic structure was assessed for each species using complementary methods: i) Principal Coordinate Analysis (PCoA) and ii) Sparse Nonnegative Matrix Factorization (sNMF), where we used a putatively neutral genomic dataset (hereafter termed ‘neutral genotype matrix’) by removing outlier loci identified using a genome scan of the filtered genotype matrix with pcadapt.
Isolation-by-Distance vs Isolation-by-Resistance:
Patterns of Isolation-by-Distance (IBD) and Isolation-by-Resistance (IBR) were identified between reefs by regressing linearised genetic distance (i.e., FST/1-FST; Rousset, 1997) against Euclidean or sea distances, respectively. IBD was calculated from the log10 of Euclidean distances obtained by converting degree latitude and longitude to cartesian coordinates. IBR was calculated from the log10 of sea distances derived from ocean current data for each calendar month and annually.
Genotype–Environment Associations (GEA):
We performed GEAs using multivariate redundancy analyses (RDA) at the site-level following methods of Capblancq and Forester (2021). For each sampling site, we converted individual genotypes of the filtered dataset to allele frequencies per locus by averaging observed genotypes across individuals, ignoring missing data. We imputed allele frequencies for loci missing information at the site-level using the median allele frequency from the other sites in the same region (i.e., Seychelles, Mauritius or Rodrigues). We chose explanatory environmental variables for the RDA using a bidirectional stepwide model selection procedure (ordistep), which retained mDHW (mean degree heating week) and sSST (sd of sea surface temperature) for both species, as well as BackReef (proportion of backreef around site) for P. damicornis.
To detect putative genomic regions under selection, we performed a multivariate GEA using a partial RDA (pRDA) on the entire genotype matrix, with the ordistep selected variables as predictors. Reef-level population structure was used to condition the RDA, limiting false positives while potentially reducing power to detect true outlier loci along neutral gradients
Gene Ontology (GO) enrichment analysis:
Gene Ontology (GO) enrichment analyses were used to assess the putative molecular function(s) of significant outlier SNPs (i.e., q-value<0.05) from the pRDA.
Estimating the Adaptive Seascape:
We estimated the adaptive seascape across the WIO to identify reefs potentially harbouring adaptive genetic variation linked to thermal gradients. For each species, we performed an RDA to quantify genotype–environment interactions using site-level allele frequencies as the response matrix, using the ordistep selected environmental variables. We then projected environmental conditions from all mapped WIO reefs into the RDA ordination space to derive a genetic-based index of adaptation (Adaptive Index) for each environmental pixel of the seascape. This Adaptive Index reflects how genetic variation is structured relative to thermal gradients, allowing us to map the spatial distribution of reefs where allelic compositions are most consistent with thermal adaptive potential across the WIO seascape
Funding statement and acknowledgementsThis study was funded by the Adaptation Fund (AF), who financed the “Restoring Marine Ecosystem Services by Rehabilitating Coral Reefs to Meet a Changing Climate Future” project (PIMS No. 5736), implemented by United Nations Development Programme (UNDP) Mauritius and Seychelles, with the support of the Ministry of Agro-Industry, Food Security, Blue Economy, and Fisheries, and in the Republic of Seychelles with the support of the Ministry of Agriculture, Climate Change and Environment. They were sampled under permits SPA 30 and A0157. We gratefully acknowledge the contributions of the following institutions to field sampling efforts: Albion Fisheries Research Center, Eco-Sud, Reef Conservation, Marine Conservation Society Seychelles, Mauritius Oceanographic Institute, Nature Seychelles, Seychelles National Park Authority, Rodrigues Regional Assembly and Shoals Rodrigues. We thank the UNDP Mauritius and Seychelles for their support in coordinating activities throughout the project. We also thank Katherine Prata and Zoe Meziere for discussions regarding genomic analyses. ASG acknowledges support from the Swiss National Science Foundation (ASG acknowledges support from the Swiss National Science Foundation (SNSF; Postdoc Mobility Fellowship P500PB_230450).
The data were used in the following scientific publication:
Guillaume, A.S., Joost, S., Curpen, S., Dumur Neelayy, D., Harree-Somah, L., Sadasing, O., Saponari, L., Dale, C., Barret, L., Andrews, N., Leckraz, S.K., François, R., Seetapah, V., Munusami, V., Bacha Gian, S., Jhangeer-Khan, R., Mahoune, T., Chumun, P.K., Poretti, M., Berteaux-Lecellier, V., Lecellier, G., Selmoni, O. (2026). Coral genetic structure in the Western Indian Ocean mirrors ocean circulation and thermal stress history. Evolutionary Applications. https://doi.org/10/1111/eva.70206
A.S.G. and O.S. wrote the scripts for these analyses, with final versioning and archiving by A.S.G.
