Data for: Classic hypotheses of area, time, and climatic stability fall short in explaining high tropical species richness
Data files
Mar 28, 2025 version files 159.25 MB
-
Area_databases.zip
682.85 KB
-
Array_reclassified.rds
136.42 MB
-
GeoTaxa_databases.zip
5.76 MB
-
Landmasses_masks.zip
8.24 KB
-
New_richness_df.csv
16.37 MB
-
README.md
5.48 KB
Abstract
Tropical biodiversity overshadows the number of species inhabiting other regions. Age, area, and stability constitute three classical ideas used to explain the higher richness in these warm and humid zones. In this study, we measured the global dynamics of tropical, arid, temperate, cold, and polar climate zones over the last 5 million years (Ma). We aimed to evaluate whether the age, area, and stability of these climate zones contribute to explaining the observed differences in species richness. We classified the paleoclimatic layers generated by the PALEO-PGEM climatic emulator – temperature and precipitation for the last 5 Ma at 1,000-year intervals – into the main Köppen-Geiger climate zones: tropical, arid, temperate, cold, and polar. We then calculated three variables: age, area, and stability. Age represents the duration that each map cell has remained within its current climate zone since its last change (map cell-based measure). Area quantifies the total extent of each climate zone over time by summing all map cells corresponding to that climate zone (climate zone-based measure). Stability indicates the number of times a given map cell changed between climate zones over time (map cell-based measure). We implemented regression and correlation tests, Structural Equation Models, and decision trees to measure the relationship between these estimates and current global patterns of amphibian, bird, and mammal richness. Our results indicate that age, area, and stability do not account for the observed differences in species richness among the 5 climate zones. None of these classical hypotheses alone can explain the high vertebrate tropical richness observed. Further investigation, incorporating additional taxa (e.g., invertebrates or plants), or integrating new perspectives (such as the influence of local variations in diversification processes) will provide a more comprehensive understanding of the factors shaping large-scale biodiversity patterns.
https://doi.org/10.5061/dryad.h70rxwdv4
Description of the data and file structure
Code and data files for running the analyses of the article “Classic Hypotheses of Area, Time, and Climatic Stability Fall Short in Explaining High Tropical Species Richness”. This repository includes area and richness data, climate classification arrays, landmass shapefiles, species presence/absence databases, and the R script used for processing and analysing.
Files and variables
File: Area_databases.zip
Description:
- “Total_area_global.csv”: Global area (km²) of climate zones across all time steps.
- “class”: climate zone (1 = Tropical; 2 = Arid; 3 = Temperate; 4 = Cold; 5 = Polar).
- “value”: area in square kilometres (km2).
- “layer”: Time step (from present as “Time 1”, to 5 million years as “Time 5,001”).
- “Total_area_Africa.csv”, “Total_area_America.csv”, and “Total_area_EurasiaOc.csv”: Area (km²) of climate zones by landmass.
- “layer”: Time step (from present as “Time 1”, to 5 million years as “Time 5,001”).
- “level”: this variable indicates that we calculated this metric at the class-level (may be omitted).
- “class”: climate zone (1 = Tropical; 2 = Arid; 3 = Temperate; 4 = Cold; 5 = Polar).
- “id”: “landscapemetrics” analysis residual variable (may be omitted).
- “metric”: ”landscapemetrics” analysis residual variable (it also indicates that this metric is class-level, but may be omitted).
- “value”: area in square kilometres (km2).
- “layer2”: duplicated “Time step” variable (for checking purposes, but may be omitted).
- “Area_SR.csv” indicates the accumulated area of each climate zone and landmass, and its associated amphibian, bird, and mammal richness (Area_SR.csv”).
- “Land_mass”: landmass (1 = America, 2 = Africa, 3 = EurOc).
- “Clim_zone0”: climate zone (1 = Tropical, 2 = Arid, 3 = Temperate, 4 = Cold, 5 = Polar).
- “Area”: accumulated area in square kilometres (km2).
- “SR_mam”: Mammal species richness.
- “SR_amp”: Amphibian species richness.
- “SR_bir”: Bird species richness.
File: Landmasses_masks.zip
Description: Shapefiles corresponding to the three landmasses analysed in the study.
- “Africa.shp”: Shapefile corresponding to the Africa landmass.
- “America3.shp”: Shapefile corresponding to the American landmass.
- “Eurasia_Oceania.shp”: Shapefile corresponding to Eurasia + Oceania landmass.
File: GeoTaxa_databases.zip
Description: R objects for the databases indicating the presence (1) or absence (0) of each species included in this study in each of the pixels. Rows correspond to map pixels, while columns correspond to species (one column per species). The database also contains two extra columns indicating the landmass and climate zone to which each pixel belongs.
- “Mammals_df.rds”: Mammal species database.
- “Amphibians_df.rds”: Amphibian species database.
- “Birds1_df.rds”, “Birds2_df.rds” and “Birds3_df.rds”: Bird species databases (divided due to space constraints).
File: New_richness_df.csv
Description: Database indicating per-pixel information:
- “X” and “Y”: coordinates of the pixel.
- “SR_mammals”: Mammal species richness.
- “SR_amphibians”: Amphibian species richness.
- “SR_birds”: Bird species richness.
- “Age”: Age value of the pixel.
- “N_changes”: “N° changes” value of the pixel.
- “N_biomes”: “N° biomes” value of the pixel.
- “Land_mass”: landmass (1 = America, 2 = Africa, 3 = EurOc) of that pixel.
- “Clim_zone0”: climate zone (1 = Tropical, 2 = Arid, 3 = Temperate, 4 = Cold, 5 = Polar) to which the pixel belongs.
- “Area_global0”: accumulated global area of the climate zone to which the pixel belongs.
- Area_continent0”: accumulated landmass area of the climate zone to which the pixel belongs.
- “Temp_sd” and “Prec_sd”: Additional calculated variables (standard deviation of temperature and precipitation variables of the pixel), but not used in the present study.
File: Array_reclassified.rds
Description: Climate classification array for the past 5 million years (at 1,000-year intervals), based on PALEO-PGEM emulator maps (Holden et al., 2019), and reclassified into 5 Köppen-Geiger climate zones (resolution of 0.5°): 1 = Tropical, 2= Arid, 3 = Temperate, 4 = Cold, 5 = Polar.
File: Tropics_script_24JBI.R
Description: R script for processing data, running all analyses, and generating figures included in the manuscript.
Access information
Climate data and species occurrence and biodiversity data were sourced from:
-
Paleoclimate data; PALEO-PGEM emulator by Holden et al., 2019.
-
Occurrence biodiversity data; IUCN; https://www.iucnredlist.org/resources/spatial-data-download
Code/software
The R script “Tropics_script_24JBI.R” includes the code for processing the data, running analyses, and generating the figures. All analyses were conducted using R version 4.2.2.
Implemented packages: dplyr, raster, sp, data.table, abind, foreach, doParallel, rasterVis, RColorBrewer, ggplot2, ggthemes, paletteer, beanplot, viridis, MetBrewer, plotrix, landscapemetrics, terra, Bolstad2, rgdal, nlme, lmPerm, gstat, scales, rnaturalearth, sf, rpart, rpart.plot, randomForest, lavaan.
Climate data
We focused on the last 5 Ma, encompassing the Pliocene (5.3-2.6 Ma), Pleistocene (2.6-0.01 Ma), and Holocene (0.01 Ma ago to the present day; Cohen et al., 2021). This period started with a subtle warming trend in the early Pliocene (until 3.2 Ma), continuing with successive cooling pulses that culminated with the establishment of continental northern hemisphere glaciations (Zachos et al., 2001). Moreover, orographic and tectonic events occurred during this period, such as the gradual uplift of the Andes (Gregory-Wodzicki, 2000).
To examine how these climatic changes influence biogeographic patterns, we used the high-resolution climate emulator PALEO-PGEM (Holden et al., 2019), which provides global temperature and rainfall monthly data with 1,000-year temporal resolution and 0.5º spatial resolution (Fig. 1). Using monthly mean temperature (°C) and total rainfall (mm) data from PALEO-PGEM, we created 10,002 arrays representing each time step over the past 5 Ma (two arrays of 12 matrices per time step). These data were classified based on the Köppen-Geiger climate classification system (Beck et al., 2018b) into five primary climate zones (tropical, arid, temperate, cold, and polar), from the present (pre-industrial, ca. 1760) to 5 Ma ago. This classification was implemented through the “KoppenGeiger” MATLAB function (Beck et al., 2018a) adapted for use in R (Galván et al., 2023). This results in 5,001 final matrices, representing climate zones distribution at 1,000-year intervals. As other biomes classifications differ in their ways to categorize them, which can influence the conclusions of the study (Donoghue & Edwards, 2014), we selected the Köppen-Geiger classification due to its exclusive reliance on climatic parameters (Beck et al., 2018b).
Biodiversity data
We obtained terrestrial mammals and amphibians range maps from the IUCN website on 24th January 2022 and 1st of March 2022 (https://www.iucnredlist.org/resources/spatial-data-download; IUCN, 2021), and birds range maps from BirdLife International on 4th of March 2022 (http://datazone.birdlife.org/species/requestdis; BirdLife International & Handbook of the Birds of the World, 2021). Mammals and amphibians range maps were provided as shapefiles, so we loaded them in R using the ‘rgdal’ package (Bivand et al., 2021). Bird range maps were provided as an ESRI file geodatabase, so we dissolved the polygons corresponding to a single species and exported them as a shapefile in QGIS (QGIS Development Team, 2021). Then, for all groups, we excluded species range polygons with “presence” values 3 (“possibly extant”) and 6 (“presence uncertain”), as well as with “origin” values 3 (“introduced”) and 4 (“vagrant”), avoiding highly uncertain records and keeping the natural range of each species (Miraldo et al., 2016). We rasterized these polygon data at 0.5º, creating presence/absence raster files per species and richness maps for each group, using the ‘terra’ package (Fig. S2; Hijmans, 2022).
Hypothesis testing and statistical analyses
Once matrices of climate zones distribution were created, we rasterized and reprojected the maps to the Mollweide equal area projection (minimizing area distortion and enabling the calculation of global and regional metrics; Video S1 and S2). Then, we performed the following measures using the ‘raster’ package (Fig. 1; Hijmans, 2023): 1) For the “time-for-speciation effect” hypothesis, we measured the “age” of a climate zone in each map cell as the number of time steps, with each step representing a thousand years, that the map cell has been part of the current climate zone since its last change. For example, if a map cell is currently part of the tropical climate zone, and it has been so since it changed from arid to tropical 50 steps ago, the age of the tropical zone in that cell will be 50,000 years. 2) For the area-related hypothesis, we measured the total “area” of each climate zone as the sum of the area of all map cells corresponding to the same climate zone in each time step (in km2). To do so, we used the “lsm_c_ca” function of the ‘landscapemetrics’ package (Hesselbarth et al., 2019). In addition, as the importance of integrating time into area measurements has been demonstrated for several plant and animal groups (Belmaker & Jetz, 2015; Fine & Ree, 2006; Jetz & Fine, 2012), we measured the “accumulated area” of a climate zone through all the time steps as the area under the curve of this temporal trend (using the “sintegral” function of the ‘Bolstad2’ package; Curran, 2013). 3) For the stability-related hypothesis, we considered stability as the extent to which an entity is continuously the same through a period of time (Cantidio & Souza, 2019; Carnaval et al., 2014; Costa et al., 2018; Graham et al., 2006; Terribile et al., 2012). We measured the stability of a climate zone using two metrics: (a) number of changes (“Nº changes”), measured as the total number of changes between climate zones per map cell, and (b) total number of different climate zones per map cell (“Nº climate zones”). In both cases, the lower the value, the higher the degree of stability. These stability metrics account for the change in both climatic conditions and “climate-zone” entity properties (McDonald-Spicer et al., 2019).
We worked at the map-cell level except for area measures, for which we worked at the climate-zone level (Fig. S1; Hesselbarth et al., 2019). Although we calculated these metrics globally, for the following analyses we selected study units divided by landmass (America, Africa, and Eurasia + Oceania [“EurOc”]) and animal clades (amphibians, birds, and mammals), to maximize the number of replicates and differentiate potential responses among geographically distant climate zones. We based our decision on Hagen et al. (2021), who found that tropical biodiversity is driven by different paleoenvironmental and tectonic processes in each landmass. In this sense, individual statistical analyses were applied for each metric (Age, Area, N° changes, and N° climate zones) and each study unit (e.g., amphibians in the American tropical climate zone). The masks for the landmasses were created in QGIS (QGIS Development Team, 2021), and the criteria used for these divisions can be found in Appendix S1.
To analyze the relationship between climate metrics (Age, Area, N° changes, and N° climate zones) and species richness, we created a database with geographic data of species presence/absence for each map cell using the “as.data.frame” function in the ‘terra’ package (Hijmans, 2022). Then, regarding the relationship between area and species richness, we divided it into landmasses and climate zones and calculated the richness in each zone. Lastly, we performed Spearman correlation tests, linear models, and Structural Equation Models (SEM) between the accumulated area of a climate zone and species richness for each animal clade; using the “cor” function in the ‘stats’ package (R Core Team, 2022), “lmp” function in the ‘lmPerm’ package (Wheeler & Torchiano, 2016), and “sem” function in the ‘lavaan’ package (Rosseel, 2012).
For testing the per-cell relationship between age/stability metrics and biodiversity levels, we independently used three explanatory variables: Age, N° changes, and N° climate zones. In addition, we independently include amphibian, bird, and mammal species richness as response variables. Then, we performed per-clade generalized least squares (GLS) regressions for each climate zone and land mass, using the ‘nlme’ and ‘gstat’ packages (Pebesma, 2004; Pinheiro et al., 2021). To do so, we implemented two sampling approaches: a random sampling, for which we randomly sampled 500 map cells of the original data set; and a stratified sampling, for which we divided the original set into subsequent categories and performed a random sampling in each of them (either selecting 50 map cells or 10% map cells per category). For this stratified sampling, we selected ten categories for “Age” (every 500,000 years), eight categories for “N° changes” (every 200 changes), and four categories for “N° climate zones” (one per each value of distinct climate zones). Following Tejero‐Cicuéndez et al. (2022), variables were scaled so that coefficients may range from -1 to 1. In addition, we included an exponential autocorrelation structure in the models to account for the non-independence of the data (Beguería & Pueyo, 2009), and we applied a bootstrapping procedure of 1000 replicates to determine 95% models’ confidence intervals. Lastly, we selected as significant those cases where more than 95% of their replicates are significant (p-value <= 0.05).
Finally, we performed decision trees and a random forest to identify the variables better explaining the number of species per map cell on the three taxa. A decision tree is a binary recursive partitioning algorithm that classifies a data set according to the homogeneity of subgroups of data. Starting with the entire data set at the tree root, and through the branches, data are divided in each of the nodes according to one of the explicative variables (Steinberg, 2009). Furthermore, random forest is a machine-learning algorithm that classifies data by combining the predictions of many decision trees generated by random samples of map cells (Costa et al., 2018). We constructed 500 trees including previous variables (Age, N° changes, and N° climate zones), and “Landmass” (“America”, “Africa” or “EurOc”) and “Climate zone at time 0” (“Tropical”, “Arid”, “Temperate”, “Cold” or “Polar”) as factor variables. We excluded “Area” variable due to its close relationship with “Landmass” and “Climate zone at time 0”, and we used the ‘rpart’, ‘rpart.plot’, and ‘randomForest’ packages (Liaw & Wiener, 2002; Milborrow, 2019; Therneau et al., 2019).
All analyses were run in R version 4.2.2 (specific R packages are available in Appendix S2; R Core Team, 2022), and computationally demanding sections were carried out on FinisTerrae-III supercomputer (CESGA, 2021). Landmasses were plotted using Natural Earth data (https://www.naturalearthdata.com/).