Data from: A zoogeographic model for the evolution of diversity and endemism in Madagascar
Data files
Mar 26, 2025 version files 11.61 GB
-
Biodiversity_data.zip
2.85 GB
-
Environmental_correlation_data.zip
125.73 MB
-
Phylogenomic_data.zip
8.46 GB
-
README.md
12.73 KB
-
SDM_data.zip
162.14 MB
Abstract
The delineation of zoogeographic regions is essential for understanding the evolution of biodiversity. Madagascar, characterized by high levels of endemism and habitat diversity, presents unique challenges and opportunities for such studies. Traditional global zoogeographic classifications, largely based on vertebrates, may overlook finer-scale patterns of diversity. This study employs comprehensive ant distribution datasets and phylogenomic data to propose a refined zoogeographic model for Madagascar. Utilizing Phylogenetic Simpson’s Turnover, we identified three primary regions—Eastern, Northern, and Western—each characterized by distinct environmental and phylogenetic profiles. Further subdivision revealed nine subregions, reflecting variations in elevation, net primary productivity, and terrain ruggedness. Our findings highlight the importance of topographical and environmental barriers in shaping phylogenetic diversity and endemism. Notably, we observed significant phylogenetic clustering in lowland areas and distinct differences in net primary productivity and elevation across regions. This study underscores the value of integrating phylogenetic data in zoogeographic analyses and provides a nuanced framework for investigating biodiversity patterns in Madagascar, offering insights into the processes driving speciation and endemism on the island.
Gabriela P. Camacho, Ana Carolina Loss, Brian L. Fisher, Bonnie B. Blaimer
- Corresponding authors: gpcamacho@usp.br; bonnie.blaimer@mfn.berlin
We developed a species distribution database for Madagascar’s ants using data from AntWeb. All raw distribution data can be access at www.antweb.org. We removed non-endemic species, unassigned morphospecies, and duplicated records for spatial consistency. Geographic ranges were estimated using ecological niche modeling (ENM) with environmental data from MadaClim. To address sampling bias, we created a sampling density map from 33,295 collection events and used Maxent for model estimation. The final database included distribution estimates for 779 taxa, covering 91.7% of Madagascar’s valid species, based on over 120 million locality records. For phylogenomic data, we sequenced DNA from 1,183 ant specimens, combining target enrichment of ultraconserved elements (UCEs) with next-generation sequencing. We inferred phylogenetic relationships for eight ant clades and created a backbone phylogeny using IQ-TREE and ModelFinder, generating ultrametric trees with the R package APE. Spatial analyses of phylogenetic diversity and endemism were conducted using 10×10 km grid cells, calculating various indices such as species richness, weighted endemism, phylogenetic diversity, and endemism. We identified zoogeographic regions through cluster analysis and correlated diversity metrics with environmental variables, performing statistical comparisons to elucidate regional differences.
Description of the data and file structure
Camacho_et_al.2024_data
├── Biodiversity-distribution-data
│ ├── input-distribution-data
│ │ ├── XY-endemic-sdm.csv # Geographic ranges estimated using ecological niche modeling (ENM) for species with ≥3 unique occurrence records at 30 arc-sec resolution (AntWeb dataset)
│ │ └── XY-endemic-sdm-psg-29738.csv # Geographic ranges converted from lat/long to meters using convert-csv-to-metres.R [https://github.com/NunzioKnerr/biodiverse_pipeline]
│ ├── input-tree-data
│ │ └── all-mami-dated-sec-calibration-with-root.nex # Dated, rooted tree (scripts in: Phylogenomic analysis)
│ ├── trimmed-data
│ │ ├── XY-endemic-sdm-psg-29738-trimmed.bds # Biodiverse distribution-based data generated via create_bds.pl and trimmed with trim_bds_and_bts.pl
│ │ └── all-mami-dated-sec-calibration-with-root-trimmed.bts # Tree-based data created via create_bts.pl and trimmed using trim_bds_and_bts.pl
│ └── results
│ └── XY-endemic-sdm-psg-29738-trimmed-analysed* # Output .bds (analysis with 999 randomizations), .csv (results), and .bps (PhyloS2 clustering project)
├── Environmental correlation data
│ ├── input
│ │ └── antregions-metrics-phyloS2-alt_npp_rugg.csv
│ │ └── antregions-metrics-phyloS2
│ # Compiled input from Biodiverse and environmental data grouped by zoogeographic region
│ ├── LDA
│ │ ├── 2groups
│ │ ├── 3groups
│ │ ├── 4groups
│ │ ├── 5groups
│ │ ├── 6groups
│ │ ├── 7groups
│ │ ├── 8groups
│ │ └── 9groups
│ │ # Contains .R scripts and result files for Linear Discriminant Analyses for each group subdivision
│ └── regressions
│ ├── Elevation
│ │ ├── figures
│ │ └── non-parametrical-analysis
│ ├── NPP
│ │ ├── figures
│ │ └── non-parametrical-analysis
│ ├── PD
│ │ ├── figures
│ │ └── non-parametrical-analysis
│ ├── PE
│ │ ├── figures
│ │ └── non-parametrical-analysis
│ ├── Ruggedness
│ │ ├── figures
│ │ └── non-parametrical-analysis
│ ├── SR
│ │ ├── figures
│ │ └── non-parametrical-analysis
│ └── WE
│ ├── figures
│ └── non-parametrical-analysis
│ # .R scripts for non-parametric tests, significance analyses, and visualizations of environmental correlation with Biodiverse metrics, as well as figures generated from the script.
├── phylogenomic data
│ ├── phylogenomic analysis
│ │ ├── amblyoponinae
│ │ │ ├──
Folder Descriptions
Biodiversity-distribution-data: Raw and processed occurrence data and phylogenetic tree files
Environmental correlation data: Environmental predictors and outputs for LDA and regression analyses
Phylogenomic data: UCE matrices, partition files, and tree dating results
SDM data: Inputs, scripts, and outputs for species distribution models (Maxent and R)
Software Requirements
General
R version: ≥ 4.1.0
Java: Required for Maxent (≥ 3.4.1)
R Packages
Install with:
install.packages(c("tidyverse", "caret", "MASS", "ggord", "ggpubr", "rstatix", "extrafont", "raster", "sp", "dismo", "dplyr", "rgeos", "sf", "tmap", "ggplot2"))
Phylogenetics Tools
IQ-TREE, ModelFinder, PartitionFinder
APE (R package): For ultrametric trees
Workflow Summaries
A. Species Distribution Modeling (SDM) (Run scripts 00 to 05 in that order)
Pipeline: 00_SWD_bias.R → 05_buffer.R
Inputs: XY_SDM.csv, biasSWD_all.csv, Hit_List_All_Malagasy.csv
Bias Correction: 10,000 background points extracted using bias raster
Filtering: Species with ≥3 records (SDM); 1–2 records (buffer)
Maxent Modeling: Run per species
Model Evaluation: AUC and TSS
Outputs: Binary & continuous prediction rasters
B. Phylogenetic Analysis
Alignments: UCE sequences aligned & trimmed (GBlocks)
Concatenation: 90% complete matrices
Tree Inference: IQ-TREE + 1000 bootstraps
Dating: Penalized likelihood, subgroup-level dating
C. Biodiverse Analysis
Inputs: .bds, .bts files
Metrics: Richness, PD, PE, Weighted Endemism
Randomization: 999 permutations
Outputs: CSVs, per-cell summaries
D. LDA & Regression
Scripts: LDA-9groups-final.R, regressions_S2_antregions.R
LDA Input: antregions-metrics-phyloS2-alt_npp_rugg.csv
Outputs: Discriminant scores, histograms, bioregions
Regression Input: antregions-metrics-phyloS2.csv
Tests: Kruskal-Wallis, Wilcoxon
Outputs: Boxplots, summaries by metric
Tabular Data Variables
XY-endemic-sdm.csv - Geographic ranges estimated using ecological niche modeling (ENM) for species with at least three unique occurrence records at the 30 arc-second spatial resolution, based on the AntWeb dataset.
XY-endemic-sdm-psg-29738.csv - geographic ranges converted from lat/long to meters with convert-csv-to-metres.R, available at https://github.com/NunzioKnerr/biodiverse_pipeline
Column Unit Description
X - Row index (ignore)
otu text - Operational Taxonomic Unit (species)
Longitude- degrees Longitude
Latitude - degrees Latitude
x_psg_29738 meters - Projected X coordinate (EPSG:29738)
y_psg_29738 meters - Projected Y coordinate (EPSG:29738)
XY-endemic-sdm-psg-29738_concatenated.csv - results from the Biodiverse analysis
Column Description
ELEMENT - Unique grid cell ID
Axis_0, Axis_1 - Ordination scores (e.g., PCoA)
ENDW_CWE - Corrected Weighted Endemism
ENDW_RICHNESS - Weighted species richness
ENDW_SINGLE - Single-cell endemics
ENDW_WE - Classic weighted endemism
PD, PD_P - Faith’s Phylogenetic Diversity & null p-value
PD_per_taxon - Mean phylogenetic branch length per species
PD_P_per_taxon - Null-model adjusted p-value
PE_WE, PE_WE_P - Phylogenetic Endemism & significance
PHYLO_RPD* - Relative Phylogenetic Diversity metrics
PHYLO_RPE* - Relative Phylogenetic Endemism metrics
antregions-metrics-phyloS2.csv, …_alt_npp_rugg.csv - .csv files with compiled input information from Biodiverse results and environmental data grouped by zoogeographic region, for running scritps on LDA/ and regressions/ folders.
Column Unit Description
Longitude, Latitude - meters UTM coordinates
alt (meters) - Altitude
NPP_avg - Mean net primary productivity
rugg - Terrain ruggedness
ENDW_WE index - Weighted endemism
PD_P, PE_WE_P - p-value Significance tests
S2_Xgroups category - Bioregion (2–9 groupings)
XY_SDM.csv, biasSWD_all.csv
Column Unit Description
otu text - Species name
Long_SDM, lat, long degrees - Coordinates
[predictors] - Environmental values at point
Key File Descriptions
XY-endemic-sdm.csv - Species with ≥3 records used in ENM
XY-endemic-sdm-psg-29738.csv - Projected coordinate file
*_trimmed.bds/.bts - Trimmed input files for Biodiverse
*_analysed* - Outputs from Biodiverse incl. randomizations
*_concatenated.csv - Grid cell-level summaries
antregions-metrics-phyloS2.csv - Input for regressions
scores_train_LDA_alt_npp.csv - LDA scores per bioregion
result_table_environmental.csv - Region-wise environmental summaries
XY_SDM.csv, biasSWD_all.csv - SDM input occurrence & background
01_auc_summary.csv, 02_TSS_summary.csv - Model performance evaluations
maps/ - SDM prediction PNGs
Species occurrence records (antweb_meta.csv);
species hit list (Hit_List_All_Malagasy.csv);
background sample with data (swd) file, sampled at occurrence records probability bias (biasSWD_all.csv); raster background file (predNA.tif); and folder with environmental predictors raster files.
Environmental Predictors Used in SDM
Variable (Code) Unit/Type Description
Slope (slop) degrees Elevation-based incline
Aspect (asp) degrees Slope direction (0 = flat)
Solar radiation (solar) Wh·m⁻²·day⁻¹ Computed from topography
Watershed (wshed) categorical 25 regions incl. Sainte-Marie
Soil (soil) categorical 11-class soil types
PET (pet) mm Potential evapotranspiration
CWD (cwd) mm Water deficit (PET - precip.)
Bioclim Variables:
bio04 - Temperature seasonality
bio05 °C Max temp. of warmest month
bio06 °C Min temp. of coldest month
bio07 °C Temp. range (bio05 - bio06)
bio13 mm Precip. of wettest month
bio14 mm Precip. of driest month
bio18 mm Precip. of warmest quarter
We developed a species distribution database for Madagascar’s ants using data from AntWeb. We removed non-endemic species, unassigned morphospecies, and duplicated records for spatial consistency. Geographic ranges were estimated using ecological niche modeling (ENM) with environmental data from MadaClim. To address sampling bias, we created a sampling density map from 33,295 collection events and used Maxent for model estimation. The final database included distribution estimates for 779 taxa, covering 91.7% of Madagascar’s valid species, based on over 120 million locality records. For phylogenomic data, we sequenced DNA from 1,183 ant specimens, combining target enrichment of ultraconserved elements (UCEs) with next-generation sequencing. We inferred phylogenetic relationships for eight ant clades and created a backbone phylogeny using IQ-TREE and ModelFinder, generating ultrametric trees with the R package APE. Spatial analyses of phylogenetic diversity and endemism were conducted using 10×10 km grid cells, calculating various indices such as species richness, weighted endemism, phylogenetic diversity, and endemism. We identified zoogeographic regions through cluster analysis and correlated diversity metrics with environmental variables, performing statistical comparisons to elucidate regional differences.