Data from: Unravelling the roles of recent speciation and trait evolution in shaping the longitudinal gradient of tropical reef fish diversity
Data files
Mar 31, 2026 version files 1.51 GB
-
code_and_data_jbi.zip
1.51 GB
-
README.md
10.51 KB
Abstract
Aim
In the tropical marine realm, the increasing concentration of species toward the Indo-Australian Archipelago (IAA) has been extensively studied. The literature provides numerous biogeographical scenarios explaining this pronounced longitudinal diversity gradient. However, most proposed scenarios did not consider the interplay between recent speciation and trait evolution, nor the influence of diversity-dependent processes.
Location
Global and regional.
Taxon
Tropical reef fishes
Methods
Based on a global data set on tropical reef fish distributions, combined with species traits (body size, trophic level and maximum depth range) and a comprehensive Actinopterygii phylogeny, we estimated assemblage-level speciation rates across the world’s tropical marine ecoregions, as well as assemblage-level trait evolution rates. We then apply a causal modelling approach that simultaneously accounts for the potential influence of both current and past environmental conditions on rates of recent speciation and trait evolution, and ultimately, on species richness distribution.
Results
Our results reveal gradients in speciation rates and trait evolution rates that do not match the spatial distribution of species richness. Compared to other regions, the species-rich IAA did not exhibit higher rates of either recent speciation or trait evolution. In contrast, the Caribbean region exhibited the greatest rates of recent speciation, but this pattern was uniquely driven by the recent radiation of hamlets (Hypoplectrus spp). Caribbean and Southwestern Atlantic reefs were also shown to harbor species with faster recent evolution in body size and maximum depth. Finally, regardless of the biogeographic realm considered (Indo-Pacific, Atlantic, or Tropical Eastern Pacific), species richness was found to strongly constrain trait evolutionary rates related to trophic level.
Main conclusions
Overall, our findings suggest that the longitudinal gradient in tropical reef fish diversity is unrelated to either recent speciation or recent trait evolution, and that the IAA no longer functions as a center of origination for tropical reef fishes. Our results also suggest that diversity-dependent mechanisms may have played a key role in shaping biogeographic patterns of recent evolution of trophic level in tropical reef fishes.
Publication Information
Journal: Journal Of Biogeography
Authors: LE GOFF, Rémy1; GABORIAU, Théo2; Albouy, Camille3,4; Pellissier, Loïc3,4; LEPRIEUR, Fabien1
Author affiliations
1 - MARBEC, Univ Montpellier, CNRS, IRD, IFREMER, Montpellier, France |
2 - Department of Computational Biology, Université de Lausanne, Lausanne,Switzerland |
3 - Department of Environmental Systems Science, Institute of Terrestrial Ecosystems, Ecosystems and Landscape Evolution, Zürich,Switzerland |
4 - Department of Landscape Dynamics and Ecology, Swiss Federal Research Institute WSL, Birmensdorf, Switzerland |
Correspondence: Rémy Le Goff remy.le-goff@umontpellier.fr
Dataset DOI: 10.5061/dryad.08kprr5g7
Description of the data and file structure
The dataset contains scripts and data to compute speciation rates and trait evolutionary estimations, data aggregation and perform biogeographic analysis using marine ecoregion of the world (MEOW).
Description
Data folders contain all data created or gathered from previous studies used in the associated publication. It investigates the link between evolutionary rates of evolution and speciation with the species richness.
Files Description
All files are in the code_and_data_jbi.zip and data_supp.zip (the latter is in Zenodo to follow CC-BY for specific datasets used).
Data Files
On Zenodo depository
- MEOW : Marine ecoregion shapefiles from Spalding et al 2008
- env_data: recent and LGM sst and depth map from Basher et al 2018
- Layers_withisland : asc files for coral area paleo reconstruction from Gaboriau et al
- phylogeny : all phylogenies files and their derivative such as ClaDS estimation
- world_continent : world_continent shapefiles
- internal_data.csv : data gathered in the research unit
- databases1_pellissier.txt : data from Pellissier et al 2014
- Data_final_Siqueira_etal.csv : Data from Siqueira et al 2019
- depth_range.RData : Depth data from Duhamet et al 2024
- DF,EnvirVar.All.5b.Periods.43041cells.rds : Recent environmental data as raster from Basher et al 2018
- Fish_species_and_traits.csv : GASPAR database from Parraviccini et al 2013
- matrice_occurrence_gaspar.csv : Occurrences data from GASPAR Project
- Scleractinia.csv : Scleractinia occurrences from PALEOBIODB
On Dryad :
(all temperatures are in Celsius, depth in m, area in km², distance in km, velocity are in km/mya.
Trait data :
- 2025-06-03_data_agg_to_imputation.RDS
- 2025-06-11_imputed_trait_data.RDS
The dataset includes 1000 dataframes of imputed traits used for LE GOFF et al 2026. The data before imputation come from various dataset : Siqueira et al. 2020, Parravicini et al. 2013, Duhamet et al. 2023 and Fishbase. Variables are :
- Trophic_ID : Trophic regime from Siqueira et al. 2020, completed with Fishbase
- depth_range : maximum depth occurrences in meters from Duhamet et al and Fishbase
- SST : Mean Sea Surface Temperature of occurrences from Siqueira et al. 2020
- DistIAA : distance from the centroid of the distribution to the IAA from Siqueira et al. 2020
- Size.SIQUEIRA : Total Length in cm from Siqueira et al. 2020 and Fishbase
- Size.GASPAR : Size class from Parravicini el al. 2013 maximum body size expressed in categories [1 = 0–7 cm; 2 = 7.1–15 cm; 3 = 15.1–30 cm; 4 = 30.1–50 cm; 5 = 50.1–80 cm; 6 = >80 cm
- Mobility 1 = sedendary; 2= mobile within a reef; 3 = mobile between reefs
- Schooling 1 = solitary; 2 = pairing; 3 = living in small groups (3–20 individuals); 4 = medium groups (20–50 individuals); 5 = large groups (>50 individuals)
- Diet_Mouillot_2014 HD = herbivorous-detritivorous (i.e., fish feeding on turf or filamentous algae and/or undefined organic material) HM = macroalgal herbivorous (i.e., fish eating large fleshy algae and/or seagrass) IS = invertivorous targeting sessile invertebrates (i.e., corals, sponges, ascidians) IM = invertivorous targeting mobile invertebrate (i.e., benthic species such as crustaceans) PK = planktivorous (i.e., fish eating small organisms in the water column) FC = piscivorous (including fish and cephalopods) OM = omnivorous (i.e., fish for which both vegetal and animal material are important in their diet)
- Trophic_level : Trophic level from Fishbase
- Axis 1 to Axis 17 : PcOA on cophenetic distance on the phylogeny used for imputation
- SIQUEIRA, RABOSKY, OBIS, GASPAR and FISHBASE : Species names in the corresponding database
Other files :
- list_polygon_with_predictors_GASPAR_imputed_data_new_agg_method.RDS
- Variables are explained in a README file in the folder. The object contains a list of 1.000 data.frame, one associated for each TR estimation.
- list_polygon_with_predictors_GASPAR_imputed_data_new_agg_method_wt_hamlets.RDS
- Same as before but Hamlets (Hypoplectrus spp) were removed from occurrences data
- 2025-06-10_TR_stat_imputation_data.RDS
- List of TR statistic estimation from imputed dataset. List for Size, Depth and Trophic level trait, each with 1 000 estimations for the 10 imputed dataset on the 100 phylogenies. The TR were log transformed
- summarise_region_gaspar_trop_200m_new_agg_methods.RDS
- Aggregation of occurrences by ecoregion occurrences for GASPAR species. Each column is a species and each row a MEOW ecoregion. 0 indicated absence and 1 presence of the species.
Intermediate READMEs are place to explain data files
Results: both files are the results from the pSEM analysis with or without diversity dependency.
Figures: scripts to produce the figure of the linked article named for the figure to create
R Scripts
Scripts should be run in the following order for full reproducibility:
| Script | Description | Generates |
|---|---|---|
| 1.aggregation_trait_base_and_synonyms.R | Merged multiple dataset by checking synonyms of taxa names | Data/trait_data/2025-06-03_data_agg_to_imputation.RDS |
| 2.marine_phylo.R | Extract the phylogeny of only marine species | marine_phylo.tre |
| 3.analyse_clads_output.R | Extract tip rates from ClaDS run | marine_tip_rates.csv |
| 4.imputation_trait.R | Do trait imputation using the trait dataset made in script 1 | ./Data/trait_data/2025-06-11_imputed_trait_data.RDS |
| 5.tr_calculation_imputed_data.R | Evaluate TR statistic on imputed data | 2025-06-10_TR_stat_imputation_data.RDS |
| 6.aggregate_data_by_region.R | Aggregate all data by marine ecoregion polygons | list_polygon_with_predictors_GASPAR_imputed_data_new_agg_method.RDS |
| 7.psem_no_div_dep.R | Perform pSEM model 1 | Results/summary_model_no_div_dep.RDS |
| 8.psem_model_diversity_dep.R | Perform pSEM model 2 | Results/summary_model_div_dep.RDS |
| 9.model_q10.R | Perform quantile sensitivity analysis using quantile 10% | |
| 10.model_q90.R | Perform quantile sensitivity analysis using quantile 90% | |
| 11.extract_direct_indirect_effect.R | Extract coefficient from the model | |
| 12.null_model_TR.R | Perform null model | Figure supplementary data |
| 13.model_without_hamlet.R | Perform pSEM without Hamlets fish as sensitivity analysis | |
| branch_lenght.R | Function for mbl by assemblage | |
| include_date.R | include date in filename when saving data | |
| new_psem_formula.R | Function to improve pSEM function | |
| piecewiseSEM_modified.R | Function to improve model in the pSEM | |
| variable_names_wt_ni.R | Store correspondance between column names and explicit names for figure |
Software Requirements
R Version
R version 4.3.0 or higher recommended
Required R Packages
Core packages:
install.packages(c(
"ggplot2", # Data visualization (>= 3.5.2 required for geomtextpath)
"dplyr", # Data manipulation
"tidyr", # Data tidying
"tidyverse", # Tidyverse collection
"ape", # Phylogenetic manipulation
"tidytree", # Tree visualisation
"missForest", # Imputation algorithm
"caret",
"doParallel", # Parallelization
"foreach",
"iterators",
# Spatial packages
"raster",
"sf",
"exactextractr",
"piecewiseSEM", # Piecewise Model
))
