Data from: Quantifying and modelling decay in forecast proficiency indicates the limits of transferability in land-cover classification
Data files
Aug 04, 2018 version files 7.17 MB
Abstract
1. The ability to provide reliable projections for the current and future distribution patterns of land-covers is fundamental if we wish to protect and manage our diminishing natural resources. Two inter-related revolutions made map productions feasible at unprecedented resolutions- the availability of high-resolution remotely-sensed data and the development of machine-learning algorithms. However, the ground-truth data needed for training models is in most cases spatially and temporally clustered. Therefore, map production requires extrapolation of models from one place to another and the uncertainty cost of such extrapolation is rarely explored. In other words, we focus mainly on projections, and less on quantifying how reliable they are. 2. Following the concept of ‘forecast horizon’, we suggest that the predictability of land-cover classification models should be methodologically explored with quantitative tools as a continuum against distances measured along multiple dimensions. Focusing on ten agricultural sites from England and using models specifically designed to predict multivariate decay-curves we ask: how does a model’s predictive performance decay with distance? More specifically, we explored if we could predict the proficiency (kappa statistics) of a model trained in one site when making predictions in another site based on the spatial, temporal, spectral and environmental distances between sites. 3. We found that model proficiency decays with spatial, temporal, spectral and environmental distance between sites. More importantly, we found for the first time that it is possible to predict the performance a model transferred to or from a novel site will have, based on its distances from known sites. The spatial distance variables where the most important when predicting model transferability. 4. Exploring model transferability as a continuum may have multiple usages including predicting uncertainty values in space and time, prioritization of strategies for ground-truth data collection, and optimizing model characteristics for defined tasks.