Skip to main content

Quantitative estimates of glacial refugia for chimpanzees (Pan troglodytes) since the Last Interglacial (120,000 BP)

Cite this dataset

Barratt, Christopher (2021). Quantitative estimates of glacial refugia for chimpanzees (Pan troglodytes) since the Last Interglacial (120,000 BP) [Dataset]. Dryad.


Paleoclimate reconstructions have enhanced our understanding of how past climates have shaped present-day biodiversity. We hypothesize that the geographic extent of Pleistocene forest refugia and suitable habitat fluctuated significantly in time during the late Quaternary for chimpanzees (Pan troglodytes). Using bioclimatic variables representing monthly temperature and precipitation estimates, past human population density data and an extensive database of georeferenced presence points, we built a model of changing habitat suitability for chimpanzees at fine spatio-temporal scales dating back to the Last Interglacial (120,000 BP). Our models cover a spatial resolution of 0.0467 degrees (approximately 5.19 km2 grid cells) and a temporal resolution of between1,000–4,000 years . Using our model, we mapped habitat stability over time using three approaches, comparing our modelled stability estimates to existing knowledge of Afrotropical refugia, as well as contemporary patterns of major keystone tropical food resources used by chimpanzees, figs (Moraceae) and palms (Arecacae). Results show habitat stability congruent with known glacial refugia across Africa, suggesting their extents may have been underestimated for chimpanzees, with potentially up to ~60,000 km2 of previously unrecognized glacial refugia. The refugia we highlight coincide with higher species richness for figs and palms. Our results provide spatio-temporally explicit insights into the role of refugia across the chimpanzee range, forming the empirical foundation for developing and testing hypotheses about behavioural, ecological and genetic diversity with additional data. This methodology can be applied to other species and geographic areas when sufficient data are available.


This is a large species distribution modelling project based on chimpanzee occurrence data from the IUCN SSC APES database ( and paleoclimate data dating back to the Last Interglacial from Bell et al. 2017 ( 

The DRYAD package here contains two files ( and to:

0 - generate pseudoabsence points prior to modelling

1- run large ensemble SDMs

2- summarize and plot variable importance across models

3 - generate stability estimates across time (static, dynamic stability and co-efficient of variation)

4 - plot these stability estimates

5 - calculate and plot SDM habitat suitability correlations across sensitivity analyses and with chimpanzee food plant species richness

6 - plot an animated GIF image of the changing habitat suitability through time (0-120 kya, 62 paleoclimate snapshots)

7 - quantify modelled estimates of refugia against previously published results to give numbers and % of refugial areas recovered

Usage notes

This DRYAD data repository contains scripts and data to repeat the analyses in Barratt et al. (2021). We provide the scripts to perform analyses for the chimpanzee dataset (full species), which can be modified to any species, subspecies or geographic area where suitable occurrence and climate data are available. We also provide the outputs of these scripts (results files and figures).

Some important points to note before use:
1. Due to the sensitive nature of the species occurrence data for chimpanzees we do not provide the input data as part of this package. The full distribution data for chimpanzees used in the analyses is available on request from the IUCN SSC A.P.E.S. database manager ( As a result, scripts #0 and #1 will require input data downloading (e.g. as a data request made above), or from alternative sources (e.g. the Global Biodiversity Information Facility).

2. We provide the paleoclimate predictor variables representing the present, Late Holocene, LGM and LIG (as available from the Worldclim website), in subfolders within the predictor_data folders, 000, 006, 021, 120 respectively. The full set of bioclim variables (bioclim 01,04,10,11,12,15,16,17) representing paleoclimate reconstructions for all 62 time snapshots are available on request from the authors of Bell et al. (2017) []. We do however, provide our modelled habitat suitability estimates for chimpanzees for all 62 paleoclimate snapshots, along with variable importance estimates so that scripts #2,#3,#4,#5,#6,#7 can all be run by users of this DRYAD package.

DATA: contains a Readme.txt file along with the following folders:

occurrence data          
is left empty due to data sensitivity, but is to store occurrence data for modelling distributions (csv and shapefile formats)

predictor data         
takes all 62 time slice data predictors as numbered folders, with each predictor named consistently between time snapshots

model outputs        
contains five subfolders; asc stores the raw model outputs (numbered by time snapshot, matching predictor data numbering scheme); tiff stores the images of the asc files; stability stores the modelled outputs (asc files) of stability estimates; variable importance stores the varimp() function outputs of the sdm R package, across each modelling algorithm; quantification of refugia stores temporary refugia raster files when conducting the analysis in script #7

map data               
contains all GIS data required for plotting, including IUCN subspecies ranges, country shapefiles, buffers around ranges for selecting background points, and known African refugia (from Maley, 1996)

correlation matrices
contains two folders (one for each of the 10km rarefied and 25 km rarefied datasets). Each of these folders contains the 3 stability estimates, plus fig and palm species richness

folder for output images

SCRIPTS: contains the following scripts, ran sequentially to repeat the analyses from the manuscript;

Generate pseudoabsencess (background points) for SDM modelling prior to building models. User specifies a buffer around occurrence points (in our example 0.5 degrees), and samples a specified number of background points from within this buffer. This approach emphasizes factors locally relevant in distinguishing suitable from unsuitable habitat, while adequately sampling the range of climatic conditions for the species

Allows the user to build ensemble SDMs using a number of different algorithms available in the sdm R package, weighting each individual model by its AUC statistic. The script then projects the models back in time onto paleoclimatic predictor variables, and provides all outputs as asc files. Modelling algorithms, background data, predictors, sampling regime and number of replicates are all flexible within the script

Allows the variable importance outputs from script #1 above to be summarized and plotted

Permits the calculation of Static, Dynamic stability and the coefficient of variation across modelled SDMs across all time periods. 

Plots the stability surfaces from script #3 above using the SDMTools R package

Summarizes the correlations between modelled outputs and palm and fig species richness. Also permits correlation plots to be made across sensitivity analyses (e.g. in this case 10km vs 25 km rarefied datasets)

Plot an animated gif of all SDMs across modelled time periods, enabling a visualization of changing habitat suitability through time

Analyze the outputs of the stability models from scripts #3 and #4 to compare them against previously known refugia. The script will polygonize known refugia based on a shapefile (Maley, 1996) then quantify the newly calculated refugia against them (in pixels and also as a percentage)