Skip to main content
Dryad

Data from: High-resolution soil total phosphorus mapping for the conterminous USA using machine learning

Data files

Jan 16, 2026 version files 786.61 MB

Click names to download individual files

Abstract

Accurate estimates of soil total phosphorus (TP) concentrations are essential for sustainable nutrient management, food security, and water quality protection. This study predicts and maps the spatial distribution of TP in the top 5 cm and C horizon of soils across the conterminous USA (CONUS) using data from the Geochemical and Mineralogical Data for Soils of the Conterminous United States. We compare the performances of random forest (RF) and inverse distance weighting (IDW) to model and generate soil TP predictions. The RF incorporates 19 predictor variables, including spatial coordinates, climate, soil properties, and topography, while IDW relies solely on coordinates and interpolates between soil TP observations. Models are evaluated using five-fold cross-validation. The RF models outperform the IDW models and explain 52 % (RMSE = 0.22 log10 mg kg -1) and 56 % (RMSE = 0.26 log10 mg kg -1) of the variance in soil TP for the top 5 cm and C horizon, respectively. As expected, both model types identify higher TP concentrations in the top 5 cm than in the C horizon, particularly in agricultural regions, reflecting anthropogenic influences. Furthermore, the RF-generated maps show more realistic spatial patterns that capture the heterogeneity of the CONUS and avoid the bullseye patterns often characteristic of IDW-generated maps. Additional insights from the RF models show that coordinates, soil texture, pH, and climate are top predictors of soil TP. Increased availability of variables, such as iron and aluminum, that can bind with phosphorus in soils, could improve RF model performance.