Data transformations cause altered edaphic-climatic controls and reduced predictability on soil carbon decomposition rates
Data files
This dataset is embargoed and will be released on Sep 05, 2025 . Please contact Daifeng Xiang at nc.ude.uhw@fdgnaix with any questions.
Lists of files and downloads will become available to the public when released.
Abstract
Data transformation of the reference decomposition rates (kref), often derived as turnover times or in alternative formats, is commonly used to develop ecological models to project the persistence of soil organic matter (SOM). However, the effects of reciprocal or logarithmic transformation of kref on model performance and edaphic-climatic patterns remain uncertain. Here, we convert published kref values into reciprocal or logarithmic formats and establish machine learning models between the transformed kref and edaphic-climatic predictors. We show that models trained with the transformed kref exhibit 11.6-68.4% reductions in model performance upon re-conversion to kref compared to those trained with the original kref. The variable importance analysis identifies distinct key predictors governing the original kref and its transformed counterparts. This suggests that data transformation alters the relative significance of predictors without necessarily improving kref prediction performances. Consequently, our study underscores the importance of directly focusing on the original values rather than alternative representations when dissecting a given variable's patterns and pertinent mechanisms in ecological modelling.
README: Data transformations cause altered edaphic-climatic controls and reduced predictability on soil carbon decomposition rates
https://doi.org/10.5061/dryad.5qfttdzgc
Description of the data and file structure
A global dataset of first-order kinetics parameters and corresponding explanatory predictors is arranged as an online spreadsheet with 859 records (Xiang et al., 2023). To comprehensively consider the effects of soil physicochemical properties, we explored five explanatory variables in addition to the eleven variables. The values of the sixteen explanatory factors were obtained from literatures corresponding to each incubation experiments. For studies not providing values of the explanatory factors, we extracted from global maps pertaining to geographic location.
Files and variables
File: Data availability.rar
Description:
This compressed archive includes raw data (named as "compiledDataset_update.csv"), source code (named as "1st_order_2nd.R") for Random Forest analysis and a folder "results".
Files in the folder "results" contain results of the model running, where "IMP-k1.csv", "IMP-k2.csv", and "IMP-k3.csv" are relative importance of explainatory variables pertaining to the fast, slow, and passive pools, while "ObsVsPred_k1.csv", "ObsVsPred_k2.csv", and " ObsVsPred_k3.csv" are the simulated and observed decomposition rates pertaining to the fast, slow, and passive pools.
The cells containing "NA" of files in the folder "Data availability" mean that the corresponding explanatory variables were excluded after RFE (recursive feature eliminate).
The cells containing "n/a" in file named "compiledDataset_update" mean missing values that were not reported from literatures.
Code/software
R version 4.1.1 maybe helpful to view and run our data and code.
Methods
A global dataset of first-order kinetics parameters and corresponding explanatory predictors is arranged as an online spreadsheet with 859 records (Xiang et al., 2023). The fitted first-order kinetics parameters in the arranged dataset were obtained from literatures fitting laboratory incubation data with one pool (M1), two pool (M2), or three pool (M3) first-order models.
This arranged dataset contains eleven explanatory factors, including (i) two climatic factors: MAP (mean annual precipitation, units: mm) and MAT (mean annual temperature, units: °C), which represent the characteristics of regional climate conditions; (ii) five edaphic factors: Sand (sand fraction, units: %), Clay (clay fraction, units: %), pH, SOC (soil organic carbon, unit: g kg-1), and MBC (microbial biomass carbon, units: g C m-2), which reflect the effects of soil property and microbial community; (iii) two topographic factors: Elev (elevation, units: m) and Slope (terrain slope, units: degree or °), indicating the impact of terrain; (iv) one vegetation factor characterizing the effects of vegetation coverage: NDVI (normalized difference vegetation index); and (v) one factor representing incubation condition: IncT (laboratory incubation temperature, units: °C). To comprehensively consider the effects of soil physicochemical properties, we explored five explanatory variables in addition to the eleven variables in Xiang et al. (2023), including (i) one vegetation variable reflecting the effect of plant productivity: NPP (net primary productivity, units: g C m-2); (ii) one variable representing the effect of soil physical properties: CFVO (volumetric fraction of coarse fragment, units:%); and (iii) three variables representing the effect of soil fertility: TN (soil total nitrogen, units: g kg-1), CNratio (the ratio of soil organic carbon to total nitrogen), CEC (cation exchange capacity, units: cmol kg-1).
The values of the sixteen explanatory factors were obtained from literatures corresponding to each incubation experiments. For studies not providing values of the explanatory factors, we extracted from global maps pertaining to geographic location.