Environmental stress gradients mediate plastic trade-offs between growth and carbon storage in dominant desert shrubs
Data files
Feb 09, 2026 version files 321.53 MB
-
Code_for_NSC.R
34.98 KB
-
Code_for_UMAP.py
4.49 KB
-
NSC_dis.zip
204.53 MB
-
README.md
21.18 KB
-
Shrub_data.xlsx
2.31 MB
-
Shrub_dis.zip
114.62 MB
-
Unet_model.py
11.22 KB
Mar 27, 2026 version files 321.63 MB
-
20260215_Source_data_file.xlsx
2.41 MB
-
Code_for_NSC.R
34.98 KB
-
Code_for_UMAP.py
4.49 KB
-
NSC_dis.zip
204.53 MB
-
README.md
23.37 KB
-
Shrub_dis.zip
114.62 MB
-
Unet_model.py
11.22 KB
Abstract
Dominant shrub species in temperate deserts exhibit specialized adaptations and divergent evolutionary strategies in response to varying and extreme environmental stresses. However, it remains unclear how shrub species balance growth and carbon storage to cope with abiotic combined stresses across extensive spatiotemporal gradients. Guided by the Soil-Plant-Atmosphere Continuum (SPAC) theory and combined with a 20-year monitoring of non-structural carbohydrates (NSC), we conducted extensive field surveys across a representative temperate desert area. Using ensemble learning on 60 integrated environmental variables, the region was automatically classified into four SPAC systems that reflect gradients of combined temperature, precipitation, radiation, soil properties, and other factors. Results revealed divergent trade-offs between growth and carbon storage of shrubs mediated by intensity and combination of stresses. Shrubs in the Qinghai-Tibet Plateau faced severe temperature-water stresses, with growth limited by carbon storage. In contrast, shrubs in the Ningxia-Shanxi region tended to promote growth in minimal water stress. NSC mobilization and internal transport capacity were key determinants of shrub resilience to extreme climate events. These findings suggest that long-term evolutionary processes have shaped flexible carbon allocation strategies along environmental gradients. Therefore, understanding these adaptive strategies is crucial for predicting vegetation dynamics and ecosystem resilience under future climate scenarios.
Dataset DOI: 10.5061/dryad.ffbg79d61
Description of the data and file structure
This collection includes the following contents:
- NSC data, shrub volume data, 60 environmental variables, and VPD data for each sampling site.
- Potential distribution maps of dominant desert shrubs.
- Spatial distribution maps of NSC content in dominant desert shrubs from 2003 to 2022.
- R code for analyzing NSC data.
- Python code for SPAC-UMAP.
- Python code for the U-Net model.
Files and variables
File: Source_data_file.xlsx
Description:
The dataset includes NSC measurements of desert shrubs, shrub volume data, 60 environmental variables (encompassing soil properties, climatic factors, topographic features, and other ecological indicators), Vapor Pressure Deficit (VPD) data, and GOSIF (Global OCO-2 Solar-Induced Chlorophyll Fluorescence) data, which serves as a proxy for photosynthetic activity. In addition, the dataset contains information necessary for constructing ecological network relationships.The names, abbreviations, and sources of all variables in the dataset are listed in the table below.
| Variables | Abbreviation | Data source | Unit |
|---|---|---|---|
| Shrub | |||
| Normalized leaf nonstructural carbohydrate content | LNSC | Field (sheet 1) | Unitless |
| Normalized leaf soluble sugar content | Lss | Field (sheet 1) | Unitless |
| Normalized leaf starch content | Lstarch | Field (sheet 1) | Unitless |
| Normalized branch nonstructural carbohydrate content | BNSC | Field (sheet 2) | Unitless |
| Normalized branch soluble sugar content | Bss | Field (sheet 2) | Unitless |
| Normalized branch starch content | Bstarch | Field (sheet 2) | Unitless |
| Shrub volume standardized to a range of 0–1 | Normalized_volume | Field (sheet 3) | Unitless |
| Taxonomic genus of the sampled shrub | Genera | Field (sheet 6) | Unitless |
| Geographical information | |||
| Longitude of the sampling site | Longitude | Field (sheet 1) | Degree |
| Latitude of the sampling site | Latitude | Field (sheet 1) | Degree |
| Region of the sampling site | Area | Field (sheet 3) | Unitless |
| Assimilation | |||
| Global OCO-2 solar-induced chlorophyll fluorescence index | GOSIF / SIF | GOSIF (sheet 3) | W m⁻² µm⁻¹ sr⁻¹ |
| Water stress | |||
| SPAC sub-regions | Area | Calculated (sheet 4) | Unitless |
| Vapor pressure deficit | VPD | TerraClimate (sheet 4) | kPa |
| Soil water content | SW | ERA5 (sheet 4) | m³ m⁻³ |
| The ratio of normalized vapor pressure deficit to normalized soil water content | VPD/SW | Calculated (sheet 4) | Unitless |
| Others | |||
| Feature importance in machine learning models | Importance | Calculated (sheet 8) | % |
| Precipitation | |||
| Total annual precipitation | AP | ERA5 | mm |
| Total summer precipitation | SP | ERA5 | mm |
| Total precipitation for the wettest month | APmax | ERA5 | mm |
| Total precipitation in the driest month | APmin | ERA5 | mm |
| Annual soil water content of 0-7 cm | ASW1 | ERA5 | m³ m⁻³ |
| Annual soil water content of 7-28 cm | ASW2 | ERA5 | m³ m⁻³ |
| Annual soil water content of 28-100 cm | ASW3 | ERA5 | m³ m⁻³ |
| Annual soil water content of 100-289 cm | ASW4 | ERA5 | m³ m⁻³ |
| Summer soil water content of 0-7 cm | SSW1 | ERA5 | m³ m⁻³ |
| Summer soil water content of 7-28 cm | SSW2 | ERA5 | m³ m⁻³ |
| Summer soil water content of 28-100 cm | SSW3 | ERA5 | m³ m⁻³ |
| Summer soil water content of 100-289 cm | SSW4 | ERA5 | m³ m⁻³ |
| Winter snowfall | WSF | ERA5 | mm |
| Winter snow cover | WSC | ERA5 | % |
| Winter snow melt | WSM | ERA5 | mm |
| Winter snow density | WSD | ERA5 | kg m⁻³ |
| Temperature | |||
| Annual mean temperature | AT | ERA5 | °C |
| Temperature annual range | ATR | ERA5 | °C |
| Summer mean temperature | ST | ERA5 | °C |
| Average temperature of the hottest month | ATmax | ERA5 | °C |
| Average temperature of the coldest month | ATmin | ERA5 | °C |
| Maximum monthly average temperature in summer | STmax | ERA5 | °C |
| Minimum monthly average temperature in summer | STmin | ERA5 | °C |
| Mean annual soil temperature at 0-7 cm | AST1 | ERA5 | °C |
| Mean annual soil temperature at 7-28 cm | AST2 | ERA5 | °C |
| Summer mean soil temperature at 0-7 cm | SST1 | ERA5 | °C |
| Summer mean soil temperature at 7-28 cm | SST2 | ERA5 | °C |
| Evaporation and radiation | |||
| Total annual evapotranspiration | AE | ERA5 | mm |
| Summer annual evapotranspiration | SE | ERA5 | mm |
| Standardized precipitation-evapotranspiration index for 12 months | SPEI12 | SPEI | Unitless |
| Standardized precipitation-evapotranspiration index for summer | SPEI3 | SPEI | Unitless |
| Annual surface net solar radiation | ASSR | ERA5 | J m⁻² |
| Annual surface net solar radiation downwards | ASSRD | ERA5 | J m⁻² |
| Summer surface net solar radiation | SSSR | ERA5 | J m⁻² |
| Summer surface net solar radiation downwards | SSSRD | ERA5 | J m⁻² |
| Soil | |||
| Total soil Nitrogen content | SN | SoilGrids | g kg⁻¹ |
| Soil pH | SPH | SoilGrids | Unitless |
| Soil organic carbon content | SOC | SoilGrids | g kg⁻¹ |
| Organic carbon stocks | OCS | SoilGrids | t ha⁻¹ |
| Organic carbon density | OCD | SoilGrids | kg m⁻³ |
| Bulk density of the fine earth fraction | BDOD | SoilGrids | kg dm⁻³ |
| Cation Exchange Capacity of the soil | CEC | SoilGrids | cmol(c) kg⁻¹ |
| Volumetric fraction of coarse fragments | CFVO | SoilGrids | % |
| Proportion of clay particles | Sclay | SoilGrids | % |
| Proportion of sand particles | Ssand | SoilGrids | % |
| Proportion of silt particles | Ssilt | SoilGrids | % |
| Spectral signature | |||
| Normalized Difference Vegetation Index | NDVI | Landsat | Unitless |
| Soil adjusted vegetation index | SAVI | Landsat | Unitless |
| Carbohydrate index | CAI | Landsat | Unitless |
| Wetness index | WI | Landsat | Unitless |
| Brightness index | BI | Landsat | Unitless |
| Salinity index A | SIA | Landsat | Unitless |
| Salinity index B | SIB | Landsat | Unitless |
| Topography and wind | |||
| Elevation | Elevation | SRTM | m |
| Slope | Slope | SRTM | ° |
| Roughness | Roughness | SRTM | m |
| Annual eastward component of the 10m wind | AUwind | ERA5 | m s⁻¹ |
| Annual northward component of the 10m wind | AVwind | ERA5 | m s⁻¹ |
| Summer eastward component of the 10m wind | SUwind | ERA5 | m s⁻¹ |
| Summer northward component of the 10m wind | SVwind | ERA5 | m s⁻¹ |
File: Code_for_NSC.R
Description:
Core code supporting data processing and visualization throughout our entire analytical workflow.
File: Code_for_UMAP.py
Description:
We utilized this code to perform dimensionality reduction on the integrated environmental variable dataset, with the aim of better understanding and validating the delineation of distinct SPAC (Soil-Plant-Atmosphere Continuum) systems within the study region. Based on prior ecological and geographical knowledge, we initially divided the study area into several subregions. Within each subregion, 2,000 spatial points were randomly selected, and for each point, a total of 60 environmental variables were extracted. These variables included soil properties, topographic features, climatic factors, and other ecological indicators, collectively capturing the environmental heterogeneity across the region.
To reduce the complexity of this high-dimensional dataset, we applied the Uniform Manifold Approximation and Projection (UMAP) algorithm to compress the 60-dimensional data into two principal components—referred to as UMAP coordinates. This transformation enabled a more intuitive visualization and assessment of environmental differences among subregions. After validating the subregional divisions using the randomly sampled points, the UMAP algorithm was subsequently applied to gridded maps covering the entire study area. This process resulted in the generation of two raster maps representing the spatial distribution of UMAP Dimension 1 and Dimension 2 values across the region.
File: Unet_model.py
Description:
This code first identifies and segments desert shrubs at each sampling site using high-resolution satellite imagery with a spatial resolution of 0.5 meters. Through image processing and classification techniques, individual shrub canopies were delineated from the background. Subsequently, the proportion of foreground pixels—those representing desert shrub cover—relative to the total number of pixels within each sampling site's extent was calculated. This proportion was then used as a quantitative metric to represent the spatial density of desert shrubs at each site.
File: Shrub_dis.zip
Description:
Potential distribution maps of dominant desert shrub species.
File: NSC_dis.zip
Description:
Spatial distribution maps of LNSC and BNSC from 2003 to 2022 generated using the XGBoost model, including annual maps as well as a map representing the 20-year average distribution pattern.
Code/software
Analytical workflow
The analytical workflow was designed to test our central hypotheses regarding the community-level carbon allocation strategies of desert shrubs across a spatially heterogeneous landscape. The workflow proceeds through four integrated stages: defining the ecological domain, predictive spatial modeling of NSC, inferential attribution of drivers, and hypothesis testing of functional zonation.
First, based on a comprehensive review of existing literature and past studies, we have established a dataset of environmental variables that includes precipitation, temperature, evaporation and radiation, soil, spectral signature, topography and wind. This dataset, comprising 60 variables (Table S1), is designed to encompass all relevant factors potentially affecting NSC content, enabling a multifaceted analysis of their impacts. The MaxEnt model, combined with the environmental dataset, was used to predict the potential distribution of dominant desert shrubs,**** which defined the spatial domain for all subsequent analyses. The dataset was split into training (70%) and testing (30%) subsets, with a 10-fold cross-validation process performed to ensure reliability. Model accuracy was evaluated using the area under the receiver operating characteristic curve (AUC). High-resolution satellite imagery, analyzed with the U-net model, confirmed the presence of shrub vegetation at the sampling sites.
Then, based on the distribution area of desert shrubs (with suitability probabilities > 0.3), we employed our previously developed and validated "SAFESCAN" platform to predict the spatial distribution of NSC content. The core of this platform is a fine-tuned XGBoost ensemble learning model, whose temporal stability and robustness have been rigorously confirmed through multi-year validation and external data checks in our prior work. For its application in this study, we specifically optimized the hyperparameters for the desert shrub communities through a 10-fold cross-validation process to ensure robust model performance evaluation. The dataset was randomly divided into 10 equal subsets: 9 subsets were used for training, while the remaining subset served as the validation set. This procedure was repeated 10 times so that each subset served as the validation set once. Hyperparameter tuning involved an extensive random grid search across a predefined parameter space, which included nrounds (from 5000 to 30000), max_depth (3-10), eta (0.001-0.3), and subsample (0.5-1.0), among others. The configuration yielding the lowest root mean square error (RMSE) across folds from 100 sampled combinations was selected as the optimal model setup. Additionally, the average R² and RMSE across all folds were calculated to provide a robust measure of the model’s predictive accuracy and generalization capability (Fig. S3). This fine-tuned XGBoost ensemble learning model has been applied to estimate and predict the spatial dynamics of NSC content across the entire study area. Using this model alongside historical environmental datasets, we have predicted NSC spatial distribution maps from 2003 to 2022 to examine the temporal trends in simulated NSC within desert shrubs. To discern the persistent, long-term spatial patterns from simulated inter-annual variability, we calculated the 20-year average of these simulated annual NSC maps, which constitute a primary result of this study.
Third, for the inferential attribution of environmental drivers on NSC content, we employed SHapley Additive exPlanations (SHAP) values and Generalized Linear Mixed Models (GLMMs). SHAP values, derived from the predictive model, quantified the influence of all 60 environmental variables. The GLMMs were then used to formally test the statistical significance of key drivers on the observed NSC,**** which served as the dependent variable. The environmental principal components were used as independent variables, and "genus" was included as a random effect to explicitly account for inter-specific variation. The models were fitted using the “lme4” package in R (see SR9 for full model structure and results).
Finally, we tested our primary hypothesis regarding the existence of distinct functional eco-climatic zones. This was structured as a semi-supervised learning problem. Based on extensive field expertise, we first posited an a priori division of the region into four distinct systems. To test this hypothesis, we applied the UMAP algorithm to the 60 environmental variables for each year from 2003 to 2022, generating annual sets of UMAP coordinates. We then averaged these annual coordinates to produce a stable, 20-year mean representation of the landscape's functional geometry. The clear separation of our a priori zones within this mean UMAP space provided robust, data-driven validation of our initial hypothesis. Within these validated SPAC systems, we subsequently analyzed the divergent growth-carbon storage trade-off strategies, including the use of correlational network analysis to compare the structural complexity of plant-environment interactions.
Access information
Other publicly accessible locations of the data:
- None.
Data was derived from the following sources:
- The sources for environmental data used in this study are as follows: ERA5 meteorological data from https://cds.climate.copernicus.eu. SPEI values from https://spei.csic.es. VPD data are obtained from www.climatologylab.org. Landsat 8 OLI and Landsat 7 ETM spectral data from https:// earthexplorer.usgs.gov. Soil attributes from SoilGrids at https://www.soilgrids.org. DEM data from the SRTM at https://www2.jpl.nasa.gov/srtm. GOSIF data are obtained from https://climatedataguide.ucar.edu/. Moreover, high-resolution satellite image data from https://www.cpeos.org.cn.
Study area
The field sampling regions are mainly situated in the desert region of central Asia, spanning 37 degrees of longitude and 20 degrees of latitude, stretching from 78° E, 32° N to 115° E, 52° N. This expansive area encompasses several typical temperate deserts, arranged from west to east: Gurbantunggut desert, Taklamakan desert, Kumtag desert, Qaidam desert, Badain Jaran desert, Tengger desert, Ulan Buh desert, Kubuqi desert, Mu Us sandy land and Hunshandake sandy land. The orientation of the sample zone ran from northwest to southeast, essentially perpendicular to the precipitation contour in the northwest desert region of China. Within this region, the average annual precipitation exhibited considerable variation, ranging from 40 mm to 394 mm. The average annual temperature spanned from 4.32 ℃ to 9.79 ℃.
Crucially, the remote and largely uninhabited nature of these deserts makes them an unparalleled natural laboratory. Having grown quietly for millennia across these diverse landscapes with minimal direct human interference, the shrub communities here exhibit true self-adaptive strategies honed by natural selection. They are therefore ideal subjects for investigating the fundamental principles of plant physiological responses to environmental variation.
Field sampling
In July 2022, during the growing season of most desert shrubs, field surveys and sample collections were conducted in the designated study area. Sampling sites were chosen based on their ability to accurately represent the local plant diversity. Additionally, areas with stable topography were prioritized, while those influenced by human activities or environmental disturbances were excluded. At each site, three large plots measuring 30 m × 30 m were established, with a minimum distance of 1 km between them. Within each large plot, smaller subplots of 5 m × 5 m were positioned diagonally to focus on shrub sampling. In these subplots, only mature, healthy shrubs with uniform growth were selected and marked for study, with their length, width, and height recorded. For each shrub, 2–3 vigorously growing branches were pruned, specifically from the southern side and mid-height of the individual. Leaves (or assimilative branches, collectively referred to as leaf in this study) were removed from the branches, which were then cut into segments approximately 3 cm in length. The collected leaves were cleaned of surface dust, sorted, labeled, and immediately placed in a refrigerated container at 0–4°C with ice packs for transportation to the laboratory. Due to the limited number of leaves on some branches, the study ultimately established 94 sampling sites, 282 subplots, and collected a total of 1,356 leaf samples and 1,476 branch samples.
Measurement of non-structural carbohydrate
To halt all enzymatic activity as quickly as possible, all fresh samples were processed on the same day of collection (within 12 hours) by being microwaved at 800 W for 90 second.**** After transporting the samples to the laboratory, all samples were placed in a constant-temperature oven at 60°C and dried until a stable weight was achieved. this is an established and standard protocol for ensuring consistency in large-scale ecological studies. The dried samples were then ground into fine powder using a ball mill (5100 Mixer Mill, Metuchen, NJ, USA). Given that the sum of total soluble sugars and starch typically accounts for over 90% of total NSC content, this sum was used as an estimate of total NSC content. The NSC content was measured using the anthrone-sulfuric acid method. In brief, 0.05 g of powdered sample was extracted with an 80% ethanol solution for 12 hours, and the soluble sugar content was determined using the supernatant obtained after two rounds of centrifugation. The remaining precipitate was boiled in distilled water, hydrolyzed thoroughly with hydrochloric acid, and the resulting solution was centrifuged again to measure starch content. The absorbance of soluble sugars and starch was measured at a wavelength of 620 nm using the anthrone-sulfuric acid method on a multifunctional microplate reader (HR 7000; Hamilton, Reno, NE, USA). Their concentrations were quantified based on a standard curve and expressed in mg/g.
Data acquisition
The ERA5 global reanalysis dataset provides comprehensive meteorological data, offering optimal climate-related variables crucial for modeling NSC content. SPEI values were obtained from the Standardized Precipitation-Evapotranspiration Index database. Spectral characteristics within the study area were derived from various bands of Landsat 8 OLI and Landsat 7 ETM satellite imagery. Soil properties were sourced from the SoilGrids dataset, while parameters derived from the Digital Elevation Model (DEM) were calculated using data from the Shuttle Radar Topography Mission (SRTM). Vapor Pressure Deficit (VPD) data were obtained from the TerraClimate dataset, which provides high-resolution (∼4 km) monthly climate and water balance data from 1958 to the present, integrating multiple observational and reanalysis products to support ecological and hydrological research. The Global OCO-2 Solar-Induced Chlorophyll Fluorescence (GOSIF) dataset, which exhibits moderate to high performance in open shrublands, serves as a robust proxy for ecosystem-scale gross primary productivity and was used in this study to approximate shrub photosynthetic assimilation capacity. Finally, the high-resolution images that were used to calculate the field coverage of desert shrubs were derived from satellite data from the Chinese high resolution earth observation system. To ensure a unified analytical framework, all datasets were resampled to a common resolution of 1000 meters using the nearest neighbor method.
Amidst the swift advancement in sensor technology, scholars have developed various metrics rooted in the interplay among soil, vegetation, and spectral data, applying these metrics to assess soil and vegetation cover in specific regions. After a thorough literature review and integrating findings from previous studies, this study selected seven widely used spectral signature metrics as environmental variables to investigate NSC content in dominant desert shrubs: Normalized Difference Vegetation Index (NDVI), Soil adjusted vegetation index (SAVI), Carbonate index (CAI), Wetness index (WI), Brightness index (BI), Salinity index (SIA), Salinity index (SIB) [60]. The formula are as follows:
SAVI = (1 + L) * (NIR - R) / (NIR + R + L)
CAI = (SWIR1 - NIR) / (SWIR1 + NIR)
SI = (NIR * R) * 0.5
BI = sqrt(R2 + NIR2)
SIA = G / NIR
SIB = (G - NIR) / (G + NIR)
where B, G, R, NIR, SWIR1, and SWIR2 correspond to reflectance in the blue, green, red, near infrared, shortwave infrared 1, and shortwave infrared 2 bands, respectively, derived from Landsat 7 ETM and Landsat 8 OLI satellite data.
Statistically analysis
To analyze NSC content in desert shrub communities, we normalized NSC levels across genera. First, we normalized data for each genus at different sampling sites, then calculated a weighted average at each site based on sample counts. The formula for community-level NSC content at each sampling point is:
NSC_community = Sum(NSC_i * Weight_i) / Sum(Weight_i)
where NSC_community is the community-level NSC content at the sampling site. Sum represents the summation from i=1 to n (where n is the total number of genera at the site). NSC_i is the standardized NSC content for genus i. N_i is the number of samples collected for genus i (which serves as the weighting factor).
The volume of an individual shrub was estimated based on the recorded dimensions of length, width, and height during sampling. The formula is as follows:
Volume = (pi / 6) * L * W * H
where Volume is the estimated volume of the shrub, L, W and H represents the recorded length, width and height, respectivily.
SHapley Additive exPlanations (SHAP) values elucidate the decision-making process of a model, transforming it from an opaque "black box" into an interpretable system. Their primary advantage lies in detailing how each variable influences individual predictions, thereby enhancing the visualization and comprehensibility of the model's operations. For each prediction, the model assigns a SHAP value to each feature, quantifying its impact on the outcome. Compared to GAIN, SHAP provides a more consistent and theoretically grounded approach based on cooperative game theory, ensuring fair attribution of feature contributions across all possible feature combinations. This method not only increases model transparency but also enables precise explanations of how features drive predictions. The formula for the effect of SHAP value on the model is as follows:
y_hat_i = y_m + f(x_i1) + f(x_i2) + ... + f(x_ij)
where, y_m is the mean prediction and y_hat_i is the SHAP value for feature , x_ij with positive values increasing and negative values decreasing the prediction.
Moreover, we used the box-and-whisker plot method to identify the extreme values of 60 environmental variables over 20 years for SPAC system 1 through 4. Extreme values are defined as those that fall below the lower bound or above the upper bound of the interquartile range (IQR). The IQR method calculates the lower and upper bounds as follows:
Lower bound = Q1 - 1.5 * IQR
Upper bound = Q3 + 1.5 * IQR
where Q1 is the first quartile (25th percentile), Q3 is the third quartile (75th percentile), IQR is the interquartile range, defined as the difference between Q1 and Q3.
The network graphs effectively highlight the differences in environmental relationships across different areas and the varying associations among NSC content and environmental factors. We randomly sampled 2,000 pixels from each of the four sub-regions within the study area and extracted 60 environmental parameters, as well as LNSC, BNSC, and GOSIF values for these pixels. First, we conducted Pearson correlation analysis on these environmental, NSC, and SIF parameters using the "LinkET" package in R version 4.3.3. After calculating the correlation coefficients and significance values for all pairs of variables, only the significant results were retained. Next, we performed Principal Component Analysis (PCA) on each category of environmental variables using the "FactoMineR" package. We then conducted the same correlation analysis between the first principal component values and NSC content, as well as SIF values. Finally, we visualized the networks for different regions using Cytoscape software version 3.10.
Data conforming to a normal distribution were analyzed using one-way ANOVA and Tukey HSD method, while those not conforming were analyzed using the Dunn's test following a Kruskal–Wallis test. The dimensions and spatial resolution of environmental variables were aligned using the "terra" package. The "xgboost" package was employed to develop XGBoost models, with grid search for the optimal hyperparameter combinations conducted via the "caret" package. SHAP values were calculated using the "shapviz" package. The MaxEnt model version 3.4.4 was used to predict the suitable zone of desert shrub NSC content sites. The U-net model was constructed using the “tensorflow” and “keras” libraries in Python 3.11, with image processing operations performed using the “opencv”, “pillow”, and “numpy” libraries. Spatial mapping and analysis were carried out using ArcGIS Pro version 3.3.
Changes after Feb 9, 2026: Renamed the "Shrub_data.xlsx" file and included additional data (Source_data_file.xlsx).
