Spatial confounding in Bayesian species distribution modeling
Cite this dataset
Mäkinen, Jussi et al. (2022). Spatial confounding in Bayesian species distribution modeling [Dataset]. Dryad. https://doi.org/10.5061/dryad.hdr7sqvm5
Abstract
- Species distribution models (SDMs) are currently the main tools to derive species niche estimates and spatially explicit predictions for species geographical distribution. However, unobserved environmental conditions and ecological processes may confound the model estimates if they have a direct impact on the species and, at the same time, they are correlated with the observed environmental covariates. This, so-called spatial confounding, is a general property of spatial models but it has not been studied in the context of SDMs before.
- Here we examine how the estimation accuracy of SDMs depends on the type of spatial confounding. We construct two simulation studies where we alter spatial structures of the observed and unobserved covariates and the level of dependence between them. We fit generalized linear models with and without spatial random effects applying Bayesian inference and record the bias induced to model estimates by spatial confounding. After this, we examine spatial confounding also with real vegetation data from northern Norway.
- Our results show that model estimates for coarse-scale covariates, such as climate covariates, are likely to be biased if a species distribution depends also on an unobserved covariate operating on a finer spatial scale. Pushing higher probability for a relatively weak and spatially smoothly varying spatial random effect compared to the observed covariates improved estimation accuracy. The improvement was independent of the actual spatial structure of the unobserved covariate.
- Our study addresses the major factors of spatial confounding in SDMs and provides a list of recommendations for pre-inference assessment of spatial confounding and for inference-based methods to decrease the chance of biased model estimates.
Methods
The study analyzes simulated and empirical species occurrence data sets. The simulated data set was created by using Gaussian process regression to generate spatial covariates, compute a species presence probability with probit-transformed linear combination of the covariates, and sample species occurrences with the presence probabilities. The empirical data set was collected in-situ in Northern Norway.
Usage notes
Matlab
Funding
Academy of Finland, Award: 317255
Jane and Aatos Erkko Foundation
Kone Foundation
Societas pro Fauna et Flora Fennica