The way bioclimatic variables are calculated has impact on potential distribution models
Cite this dataset
Bede-Fazekas, Ákos; Somodi, Imelda (2020). The way bioclimatic variables are calculated has impact on potential distribution models [Dataset]. Dryad. https://doi.org/10.5061/dryad.m37pvmd0g
1. Bioclimatic variables (BCVs) are routinely used in potential distribution models, typically without considering their calculation options in detail. We aimed at studying the impact of a decision, yet unexamined, on the calculation of BCVs, namely whether the identity of specific months/quarters in the calculation of BCVs should be updated for the future periods (temporal context). Effects on the performance of potential distribution models and on their projections were investigated. Additionally, we also aimed at comparing the impact of month/quarter shifts to that of climate model selection and covariate selection.
2. Potential natural vegetation models encompassing eight habitat types and the whole territory of Hungary were created using boosted regression trees. We tested multiple initial covariate sets to compare the impact of the temporal context to that of covariate selection. The resulting models were applied to the reference and one future time period (with data from two regional climate models). The effect of the BCV calculation approach was tested by linear mixed-effects models and model goodness-of-fit measures in a comprehensive framework of 192 predictions. Area Under the ROC Curve (AUC) and True Positive Rate (TPR) curves were used to evaluate the models.
3. Our results show that (1) temporal context of BCVs in interaction with covariate selection had a strong effect on model structure as well as on projections; (2) no evidence supporting the superiority of the widely applied calculation approach of BCVs was found. However, we found notable differences under the two approaches and examples of projection artefacts when applying the widespread way of calculation.
4. We conclude that (1) more attention and more transparent communication is needed when BCVs are used as covariates in distribution models; (2) not only ecophysiology but also the way covariates are calculated should be considered when preselecting covariates for potential distribution models.
The dataset contains the predictions in RData and geoPackage formats. The predictions are arranged in spatial databases of 267,813 rows (points) and 8 columns (habitats). The geoPackage file contains 30 layers (6 for the reference period, 12-12 for Aladin and RegCM regional climate models), while the RData file contains a list of 3 elements ('reference', 'aladin71', 'regcm71'), each containing a list of 6 elements ('full control', 'correlation control', 'full test A', 'correlation test A', 'full test B', 'correlation test B'). These list elements are POINT type sf objects in case of the reference period and a two-elements long ('static, 'dynamic') list of sf objects in case of future period.