Development and uncertainty assessment of pedotransfer functions for predicting water contents at specific pressure heads
Mehmandoostkotlar, Ali et al. (2020), Development and uncertainty assessment of pedotransfer functions for predicting water contents at specific pressure heads, Dryad, Dataset, https://doi.org/10.5061/dryad.3r2280gbw
There has been much effort to improve the performance of pedotransfer functions (PTFs) using intelligent algorithms, but the issue of covariate shift, i.e. different probability distributions in training and testing datasets, and its impact on prediction uncertainty of PTFs has been rarely addressed. The common practice in PTF generation is to randomly separate the dataset in training and testing subsets, and outcomes of this random selection may be different if the process is subject to covariate shift. We evaluated the impact of covariate shift generated by data shuffling and detected by Kolmogorov-Smirnov test for prediction of water contents using soil databases from Denmark and Brazil. The soil water contents at different pressure heads were predicted by developing linear and stepwise regression besides machine learning based PTFs including Gaussian regression process and ensemble method. Regression based PTFs for the Brazilian dataset resulted in better predictions compared to machine learning methods that estimated high water contents in Danish soils more accurately. One hundred PTFs were developed for water content at specific pressure heads by data shuffling generating covariate shift. From these, a hundred sets of fitted van Genuchten parameters were obtained representing the generated uncertainty. Data shuffling led to covariate shift, resulting in uncertainty in water content prediction by the PTFs. Inherent variability of data may lead to increased prediction uncertainty. For correlated data, simple regression models performed as good as sophisticated machine learning methods. Using PTF-predicted water contents for van Genuchten retention parameter fitting may lead to a high uncertainty.