Skip to main content
Dryad

Data from: Effects of input data sources on species distribution model predictions across species with different distributional ranges

Abstract

Species distribution models (SDMs) are a popular tool in theoretical and quantitative ecology and constitute the most widely used modelling framework in global change science and biodiversity conservation. As main data sources, SDMs require georeferenced biodiversity observations as a response or dependent variable (e.g. species occurrence, species richness, etc) and geographic layers of environmental information as predictors or independent variables (e.g. climate, land cover, vegetation indices derived from remote sensing, etc). However, although SDMs have become one of the most important quantitative tools for addressing regular and timely biodiversity assessments worldwide, these techniques are still subject to different sources of uncertainty that have been unequally assessed. Thus, despite the uncertainty related to niche-based or distribution-based models has been addressed at different stages in the modelling process, an analysis of the effect of uncertainty coming from alternative data sources on the predictive ability of SDMs is still limited. Citizen-collected species occurrence data (e.g. eBird) are often used for fitting SDMs when data from standardized and expert-supported surveys (e.g. Atlases) are unavailable. On the other hand, macroclimate variables are much more commonly used as predictors in SDMs than other sources of information coming from remote sensing data. We assessed the effects of using different data sources (in both response and predictor variables) on SDM performance across a wide range of bird species with contrasting distributional ranges in the Iberian Peninsula (Portugal and Spain). To do that, an SDM ensemble-forecasting approach was implemented by using bird data from two different data sources: the semi-structured eBird project and standardized Atlases. We fitted SDMs with three predictor types: macroclimate, remotely sensed ecosystem functional attributes (EFAs) from vegetation indices and their combination. Species were grouped into four range-size classes. We also used different evaluation metrics to better assess the uncertainty of model predictions. We then applied generalized linear mixed-effects models to test the effect on the model performance of input data sources across distributional range sizes while accounting for different accuracy metrics. Pairwise comparisons between range projections were used to assess their spatial similarity. Our models demonstrated the usefulness and complementarity of different input data sources when modelling species distribution across different distributional ranges. Citizen science and remote sensing data contribute to updating the knowledge of the distribution of the most threatened bird species by increasing the model accuracy. These findings highlight the need to integrate different data sources to improve the model predictions at a regional scale. Our framework also underlines that model uncertainty should be examined more exhaustively at the early stages of the modelling process.