Phylogenetic endemism and ancestral area inference reveal historical refugia in the Greater Cape Floristic Region
Data files
Dec 24, 2025 version files 2.16 GB
-
Phylogenetic_endemism_paper_data.zip
2.16 GB
-
README.md
7.60 KB
Abstract
Aim: Refugial areas and habitats are thought to have played a key role in facilitating both the emergence and persistence of floristic diversity in the Greater Cape Floristic Region (GCFR) of South Africa. While refugial areas may be identified using a diversity of biological proxies (e.g., narrow-range endemism; paleoendemism), there is a shortage of studies in the GCFR that apply these approaches at the species level and in a comparative manner across multiple clades.
Location: GCFR, South Africa.
Time Period: Cenozoic.
Major Taxa Studied: Protea, Leucadendron, Pentameris, Restionoideae.
Methods: We quantify and compare the spatial distribution of phylogenetic endemism (PE) in four Cape-centric plant clades, two clades of shallow-rooted graminoids (Poales) and two of deep-rooted shrubs (Proteales). For each clade, we also quantify the phylogenetic effect on PE (PPE), a metric describing the contribution of evolutionarily distinctive species (paleoendemics) to PE. Spatial moving average regression models are used to assess the influence of climate stability and topography on the distributions of PE and PPE. Finally, we use ancestral area inference to complement these analyses, on the premise that refugial areas are more likely to be resolved as ancestral.
Results: Both PE and PPE are concentrated in the southwestern GCFR, a pattern consistent with the long-term climate stability and steep relief of this mountainous region. In addition, ancestral area inference resolves the southwestern GCFR as a likely area of origin for all clades examined. Spatial patterns of PE and PPE nevertheless vary between clades, with PE centres more diffusely and broadly distributed in Proteales than in Poales.
Main Conclusions: Our study indicates that stable climate and topography have played an important refugial role in shaping patterns of diversity and endemism in the GCFR, but that functionally distinct clades (i.e., shallow-rooted graminoids versus deep-rooted shrubs) differ in terms of the location and dispersion of areas that have served as historical refugia. We attribute this variation to trait-dependent differences in their climate sensitivity.
van Blerk, J.J.; Verboom, G.A.; Cramer, M.D.; Lamberton, L.
Description of the data and file structure
File: Phylogenetic_endemism_paper_data.zip
Folders
DATA Climate stability index (past)
csi_past.tif
Used for the spatial moving average models determining whether phylogenetic endemism (PE) and the phylogenetic effect on PE (PPE) corresponds with areas of high climate stability since the Pliocene.
DATA Environmental layers
- Chelsa bioclimatic data
File 1: CHELSA_bio10_01.tif
File 2: CHELSA_bio10_02.tif
File 3: CHELSA_bio10_03.tif
File 4: CHELSA_bio10_04.tif
File 5: CHELSA_bio10_05.tif
File 6: CHELSA_bio10_06.tif
File 7: CHELSA_bio10_07.tif
File 8: CHELSA_bio10_08.tif
File 9: CHELSA_bio10_09.tif
File 10: CHELSA_bio10_011.tif
File 11: CHELSA_bio10_011.tif
File 12: CHELSA_bio10_012.tif
File 13: CHELSA_bio10_013.tif
File 14: CHELSA_bio10_014.tif
File 15: CHELSA_bio10_015.tif
File 16: CHELSA_bio10_016.tif
File 17: CHELSA_bio10_017.tif
File 1:8 CHELSA_bio10_018.tif
File 19: CHELSA_bio10_019.tif
All of the different possible bioclimatic variables for the globe to be used in the species distribution modelling.
- DEM
aspect_240_card.tif
File containing the aspect as 8 cardinal directions of the direction in which the slope is facing. Used in the species distribution modelling.
dem_240m.tif
File containing the digital elevation model, with the height above sea level in metres. Used in the species distribution modelling.
slope_240m.tif
File containing the angle of the slope of the ground. Used in the species distribution modelling.
- Fire frequency data
Fire_per_year.tif
File containing the number of fires per year. Used in the species distribution modelling.
FIRM_average.tif
File containing the average number of fires per year. Used in the species distribution modelling.
- NDVI
NDVI_annual_average.tif
File containing the NDVI average over the year. Used in the species distribution modelling.
NDVI_max.tif
File containing the maximum NDVI in a year. Used in the species distribution modelling.
NDVI_min.tif
File containing the minimum NDVI in a year. Used in the species distribution modelling.
NDVI_range.tif
File containing the range of NDVI values over a year. Used in the species distribution modelling.
- Soils
EC_mS_m.tif
File containing the electrical conductivity of the soil for the GCFR region. Used in species distribution modelling.
Ext_K_cmol_kg.tif
File containing the extractable potassium of the soil for the GCFR region. Used in species distribution modelling.
Ext_Na_cmol_kg.tif
File containing the extractable sodium of the soil for the GCFR region. Used in species distribution modelling.
Ext_P_mg_kg.tif
File containing the extractable magnesium of the soil for the GCFR region. Used in species distribution modelling.
pH.tif
File containing the PH levels of the soil for the GCFR region. Used in species distribution modelling.
Total_C_%.tif
File containing the total carbon percentage of the soil for the GCFR region. Used in species distribution modelling.
Total_N_%.tif
File containing the total nitrogen percentage of the soil for the GCFR region. Used in species distribution modelling.
- worldclim
wind speed (m s-1) 30seconds (can be downloaded at https://www.worldclim.org/). Used in the species distribution modelling.
DATA GCFR polygons
- GCFR outer polygon
gcfr_soil_buf.shx
gcfr_soil_buf.dbf
gcfr_soil_buf.prj
gcfr_soil_buf.shp
All shape files for the outer polygon (perimeter polygon) of the GCFR region. Used as region of interest for species distribution modelling and PE analysis.
- GCFR regions
GCFR_regions.shp
GCFR_regions.shx
GCFR_regions.dbf
GCFR_regions.prj
All shape files for the focal regions (SW, NW, AP, LB, KM, SE, EC) used within the GCFR.
Weimarck, H. (1941). Phytogeographical groups, centres and intervals within the Cape flora. Lund, Leipzig.
DATA Phylogenies
Protea.tre
Phylogeny of all the Protea species analysed. Used in the calculations of PE and PPE.
Restionoideae.tre
Phylogeny of all the Restionoideae species analysed. Used in the calculations of PE and PPE.
Leucadendron.tre
Phylogeny of all the Leucadendron species analysed. Used in the calculations of PE and PPE.
Pentameris.tre
Phylogeny of all the Pentameris species analysed. Used in the calculations of PE and PPE.
DATA Species distributions
- Pentameris
Pentameris.xlsx
Excel workbook file containing Pentameris spp. names, observation locations (longitude and latitude) and location notes for each observation. Empty cells represent unknown values. Used in species distribution modelling.
- Protea and Leucadendron
occ_sp.shx
occ_sp.dbf
occ_sp.prj
occ_sp.shp
Shape files containing the locations of all Proteaceae species locations including the Protea and the Leucadendron genera. Used in species distribution modelling.
Proteaceae_names_codes.csv
CSV file containing the name codes (CODE or occs_CODE) and species names (genus, species) of all the species contained in the occ_sp.shp file. Column names (CODE or occs_CODE) are linked to shapefile attributes and used to separate the locations of Protea and Leucadendron from the other species in the above dataset.
- Restionoideae
linder_restio.csv
CSV file containing the names and observation locations (longitude & latitude) of the Restionoideae species analysed. Used in the species distribution modelling.
RESULT Reconstructed Ancestral Regions (on phylogenies)
Fig S4_Pentameris_DEC_node_labels
Fig S5_Pentameris_DEC_node_probs
Fig S6_Restionoideae_DECj_node_labels
Fig S7_Restionoideae_DECj_node_probs
Fig S8_Protea_DEC_node_labels
Fig S9_Protea_DEC_node_probs
Fig S10_Leucadendron_DECj_node_labels
Fig_S11_Leucadendron_DECj_node_probs
These image files show the reconstructed ancestral regions on phylogenies which were used to produce Figure 3 in the manuscript.
RESULT Spatial PE and PPE
Leucadendron_PE_PPE_spatial.csv
Pentameris_PE_PPE_spatial.csv
Protea_PE_PPE_spatial.csv
Restionoideae_PE_PPE_spatial.csv
These spatial data-frames contain the PE (phylogenetic endemism) and PPE (the phylogenetic effect on PE) data for each pentad along with the geometries for each pentad. Used to produce Figure 2 in the manuscript.
Notes on file use
Shapefiles are a common format for vector-based geographic information system (GIS) data. They can be opened and used in any GIS software and in R or Python. A shapefile consists of multiple file types beyond the .shp (specifically, .cpg, .dbf, .prj, .sbn, and .sbx). The user only interacts directly with the .shp file but the other files need to be in the same directory.
Tag Image File Format (.tif) files are common format raster-based geographic information system (GIS) data. They can be opened and used in any GIS software and in R or Python. Pixels within the .tif files have associated values that are used in analyses.
Tree (.tre) files are common formats for storing phylogenetic trees. They can be imported into R using various packages (e.g., "ape").
Comma separated value (.csv) files are spreadsheets with information separated by commas. CSV files may be imported into GIS software, Microsoft excel, R or Python.
Excel workbooks (.xlsx) are spreadsheets that are most easily viewed in Microsoft excel but can be imported into R or Python.
See manuscript related to this dataset for a detailed description of the methodology.
