Data for variation of magnesium drives plant adaption to heterogeneous environments by regulating efficiency in photosynthesis on a large scale
Data files
Aug 22, 2024 version files 602.58 MB
-
China_vegetation_magnesium_content.csv
-
China_vegetation_magnesium_density.csv
-
data.xlsx
-
df.xlsx
-
Machine_learning.py
-
Mg_content_rasters.zip
-
Mg_density_rasters.zip
-
Multi-model_inference.R
-
README.md
Abstract
Magnesium (Mg) is a vital nutrient for plants, and its role in photosynthesis, enzyme regulation, and resistance to environmental stress is becoming increasingly evident. However, there is a paucity of knowledge regarding the characteristics of Mg (content, density, and stock) on a large scale, particularly at the community level, which serves as a fundamental unit for linking to ecosystem functions. A leaf-branch-trunk-root-matched database of the Mg content (mg g–1) and biomass (g m–2) of plant organs across 1972 sampling sites in China was constructed based on field surveys and data compilation. Using machine learning algorithms, we comprehensively explored the spatial patterns and main influencing factors of plant Mg content and density (g m–2). Deserts exhibited higher Mg content, with the primary influencing factors being high temperature and soil Mg supply. High Mg density values occurred in forests. The spatial patterns of Mg content and density underscore the adaptation of plants to environmental changes and nutrient retention capacity of forests, respectively. Our research not only provides valuable information on the distribution of Mg across different communities.
README: Title of Dataset: Data for variation of magnesium drives plant adaption to heterogeneous environments by regulating efficiency in photosynthesis on a large scale
Description of the Data and file structure
We give the mean values of Mg content and density of each organ in different vegetation of China ("China vegetation magnesium content.csv
" and "China vegetation magnesium content.csv
"), predictions of Mg content and density ("Mg content rasters.zip
" and "Mg density rasters.zip
"), and code for machine learning ("Machine_learning.py
") and multi-model inference ("Multi-model inference.R
") as well as their example data ("df.xlsx
" and "data.xlsx
").
Descriptions:
China vegetation magnesium content
Summary: The mean vegetation magnesium content under the secondary classification of the vegetation map of China based on machine learning models. The vegetation map of China here was produced by Hou (2019).
Unit: mg g–1
China vegetation magnesium density
Summary: The mean vegetation magnesium density under the secondary classification of the vegetation map of China based on machine learning models. The vegetation map of China here was produced by Hou (2019).
Unit: g m–2
Mg content rasters, Mg density rasters, and Machine_learning
Summary: RF, boosted regression trees (BRT), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM) algorithms were used to develop models that predict the Mg content and density in the leaves, branches, trunks, and roots, as well as the aboveground, belowground, and total aspects of vegetation across China. Before training the models for Mg density, the data were split into a training set (70%) and validation set (30%). For each model, the optimal combination of hyperparameters was identified through a grid search and 5-fold cross-validation (Yan et al., 2023), and the predictive performance of the model under the optimal hyperparameters was then evaluated on the validation set. The evaluation metrics included the root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2). A lower RMSE and MAE and higher R2 indicate a better predictive performance of the model. To achieve more robust prediction results and assess the uncertainty of machine learning model predictions, we used bootstrapped datasets for training the models. For samples larger than 100, we generated datasets of the same size as the original dataset; for samples smaller than 100, we generated datasets with 100 samples. We performed bootstrapping to generate 100 new datasets for each dataset. Machine learning models were trained and tuned on these new datasets. Performance was assessed as the average values of 100 models. The uncertainty of the predictions was evaluated using the coefficient of variation, and the final prediction was the average of 100 models. For each dataset, the algorithm with the lowest RMSE was selected for prediction. RF consistently achieved the lowest RMSE across all datasets and was therefore chosen for the final predictions. As an example using the Mg content rasters, for each organ and part, there are two subfolders and one aux.xml file: one for the mean values and one for the coefficient of variation (cv).
Unit:
- Mg content rasters, mg g–1
- Mg density rasters, g m–2
Abbreviations:
- abo, aboveground
- bel, belowground
- tot, total
Multi-model inference
Summary: Multivariate linear regression was employed to fit the relationship between the Mg content in various organs and environmental factors. Response variables were log-transformed, and explanatory variables were Z-score transformed to ensure the comparability of parameter estimates. Multimodel inference was used for variance decomposition and parameter estimation. The dredge function of the MuMIn package was used to generate a set of candidate models, selecting all models with an Akaike information criterion (AIC) < 2 to identify the best predictors of Mg content. When multiple models were selected, they were averaged based on the AIC weights, and the reported R2 was the weighted average of the multiple models (Gross et al., 2017). The absolute estimate of each parameter was calculated to assess its effect on the response variable.
df and data
Summary: Example data for Machine_learning.py and Multi-model inference.R, respectively, both containing response variables and explanatory variables. To run the code and read them, they need to be in the Python and R workspace directories.
Reference
- Gross, N., Bagousse-Pinguet, Y. L., Liancourt, P., Berdugo, M., Gotelli, N. J., & Maestre, F. T. (2017). Functional trait diversity maximizes ecosystem multifunctionality. Nature Ecology & Evolution, 1(5).
- Hou, X. (2019). 1:1 million vegetation map of China.
- Yan, P., Fernández-Martínez, M., Van Meerbeek, K., Yu, G., Migliavacca, M., & He, N. (2023). The essential role of biodiversity in the key axes of ecosystem function. Global Change Biology, 29(16), 4569-4585.
Methods
When investigating community structure, we set up 3 or 4 sample plots for trees (20 m × 20 m), 6 for shrubs (5 m × 5 m), and 8 for herbs (1 m × 1 m). The height and breast height diameter for trees, height, basal diameter, and crown width for shrubs were investigated for biomass calculation using allometric growth equations. We collected the aboveground portion of herbs by species after drying and weighing. The biomass of trees and shrubs was calculated, and the aboveground biomass of herbs was obtained by weighing, whereas the belowground biomass was calculated using the root-crown ratio.
Healthy and fully expanded leaves and top branches (diameter < 1 cm) were collected for trees and shrubs, along with core samples at breast height (trees only). All aboveground portions of the herbs were collected. Intact fine roots (diameter < 2 mm) were excavated along lateral roots. Based on the results of the community structure investigation, samples from different species were proportionally mixed to determine the Mg content of the plant communities. Soil samples of depth of 0–10 cm were collected in each plot to determine soil properties.
After thorough washing with distilled water, plant samples were placed in a drying oven at 60°C until a constant mass was reached. For soil samples, plant roots and gravel were first removed, sieved through a 2 mm mesh, and naturally air-dried. Subsequently, all plant and soil samples were ground into a fine powder using an agate mortar (RM200, Retsch, Haan, Germany) and a ball mill (RM200, Retsch). The powdered samples were digested in a microwave system (Mars X press, CEM, Matthews, NC, USA) and then analyzed for Mg content (mg g−1) using an inductively coupled plasma spectrometer (ICP-OES, Optima 5300 DV, Perkin Elmer, Waltham, MA, USA).
Ten environmental factors were chosen for machine learning and then predicting China Mg content and density, including five climate factors (MAT, TWMax, TCMin, MAP, and AI), two vegetation factors (PAR and NDVI), and three soil factors (SpH, CEC, and ExcMg). Data for MAT, maximum temperature of the warmest month (TWMax, °C), minimum temperature of the coldest month (TCMin, °C), and MAP were obtained from the WorldClim database (https://worldclim.org/). The aridity index (AI) was sourced from the CGIAR-CSI database (https://csidotinfo.wordpress.com/). Photosynthetically active radiation (PAR, W m–2) from 2000 to 2018 was collected from the National Tibetan Plateau Data Center (https://data.tpdc.ac.cn/) and interpolated to 1km using the Kriging method. Data on the normalized difference vegetation index (NDVI) from 2000 to 2018 were extracted from the Resource and Environment Science and Data Center (https://www.resdc.cn/). Soil pH (SpH), cation exchange capacity (CEC, cmol kg–1), and soil exchangeable Mg content (ExcMg, me 100g–1) were extracted from the Big Earth Data for Three Poles (https://poles.tpdc.ac.cn/). Maximum rate of Rubisco carboxylation (VCmax, μmol m–2 s–1) was extracted from the National Ecosystem Science Data Center (https://nesdc.org.cn/) to verify the relationship between leaf photosynthetic capacity and Mg content. In addition, vegetation types (broadleaf forest, needleleaf, and broadleaf mixed forest, needleleaf forest, shrub, meadow, steppe, tussock, and desert) extracted from the vegetation map of China was used to predict Mg content and density.
RF, boosted regression trees (BRT), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM) algorithms were used to develop models that predict the Mg content and density in the leaves, branches, trunks, and roots, as well as the aboveground, belowground, and total aspects of vegetation across China. Before training the models for Mg density, the data were split into a training set (70%) and validation set (30%). For each model, the optimal combination of hyperparameters was identified through a grid search and 5-fold cross-validation, and the predictive performance of the model under the optimal hyperparameters was then evaluated on the validation set. The evaluation metrics included the root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2). A lower RMSE and MAE and higher R2 indicate a better predictive performance of the model. To achieve more robust prediction results and assess the uncertainty of machine learning model predictions, we used bootstrapped datasets for training the models. For samples larger than 100, we generated datasets of the same size as the original dataset; for samples smaller than 100, we generated datasets with 100 samples. We performed bootstrapping to generate 100 new datasets for each dataset. Machine learning models were trained and tuned on these new datasets. Performance was assessed as the average values of 100 models. The uncertainty of the predictions was evaluated using the coefficient of variation, and the final prediction was the average of 100 models. For each dataset, the algorithm with the lowest RMSE was selected for prediction. RF consistently achieved the lowest RMSE across all datasets and was therefore chosen for the final predictions.