Skip to main content

Publication release: How well do species distribution models predict occurrences in exotic ranges?

Cite this dataset

Nguyen, Dat; Leung, Brian (2022). Publication release: How well do species distribution models predict occurrences in exotic ranges? [Dataset]. Dryad.


Species distribution models (SDMs) are widely used predictive tools to forecast potential biological invasions. However, the reliability of SDMs extrapolated to exotic ranges remains understudied, with most analyses restricted to few species and equivocal results. We examined the spatial transferability of SDMs for 647 non-indigenous species extrapolated across 1,867 invaded ranges, and identify what factors may help differentiate predictive success from failure. We performed a large-scale assessment of the transferability of SDMs using two modelling approaches: generalized additive models (GAMs) and MaxEnt. We fitted SDMs on the native ranges of species and extrapolated them to exotic ranges. We examined the influence of general factors and factors related to biological invasions on spatial transferability.

Here, we provide the code and data for publication in Global Ecology and Biogeography as part of Nguyen and Leung 2022 "How well do species distribution models predict occurrences in exotic ranges?". Provided are the files and scripts necessary to fit and validate the SDMs using distirbutional data from their native and exotic ranges, respectively, formulated as generalized additive models (GAMs) or MaxEnt models. Additionally, provided is a script to validate the SDMs on their native fitting range using 10-fold cross-validation, and to fit the transferability model, as a linear mixed model (LMM), with a provided cleaned data.frame. The dataset provided includes a full species list with GBIF occurrence records, target-group background (TGB) records to use with model fitting and validation, as well as environmental data associated with the sightings.


Bioclimatic and elevation data was gathered from WorldClim version 2 at 2.5-arcmin resolution (~ 5-km grid size). Terrain ruggedness index (TRI) was derived from the elevation data using the 'terrain' function in the 'raster' package. NDVI as an index of vegetation cover ('greenness') was restrieved from National Aeronautics and Space Administration (NASA) Land Processes Distributed Active Archive Center and calculated as the maximum values across months within a year for a given site, then averaged across all available years (2000 to 2020). Prior to model fitting, we accounted for collinearity by removing highly correlated variables using pairwise correlation cofficient value of |r| > .7, resulting the final set of predictors used in all SDMs: annual mean temperature (bio1), mean diurnal range (mean of monthly maximum and minimum temperatures; bio2), temperature annual range (bio7), mean annual precipitation (bio12), precipitation seasonality (coefficient of variation; bio15), elevation, TRI and maximum annual NDVI. All variables were standardized to a mean of zero and standard deviation of one.

Sightings records for the 647 species were obtained from the Global Biodiversity Information Facility (GBIF), and gridded to environmental data at 2.5-arcmin resolution. Records were cleaned using the 'CoordinateCleaner' package, and sightings with missing environmental data were excluded. Grid cells with multiple occurrences were treated as a single point. Records were additionally classified as native or exotic using Centre for Agriculture and Bioscience International (CABI) Invasive Species Compendium and the International Union for Conservation of Nature (IUCN) Global Invasive Species Database.

Usage notes

All data should be placed in the "./data/" directory, relative to the R working directory, and files generated by running the code are placed in the "./output/" directory, by default.