Data from: Environmental modulation of plant mycorrhizal traits in the global flora
Data files
Sep 05, 2023 version files 1.45 GB
-
10_Phy_env_eigenvectors.R
-
11_RF_modelling.zip
-
12_Feature_importance.zip
-
13_50kmGrid_traits.zip
-
14_stack50.tif
-
2_Figure_2.zip
-
3_Figure_3.zip
-
4_Figure_4.zip
-
5_11770.csv
-
6_Sensitivity_test.zip
-
7_Myco.plant.supertree.output_v3.rds
-
8_Phylogenetic_signals.zip
-
9_50km_46957grids.csv
-
README.md
Mar 07, 2024 version files 1.45 GB
-
10_Phy_env_eigenvectors.R
-
11_RF_modelling.zip
-
12_Feature_importance.zip
-
13_50kmGrid_traits.zip
-
14_stack50.tif
-
2_Figure_2.zip
-
3_Figure_3.zip
-
4_Figure_4.zip
-
5_11770.csv
-
6_Sensitivity_test.zip
-
7_Myco.plant.supertree.output_v3.rds
-
8_Phylogenetic_signals.zip
-
9_50km_46957grids.csv
-
README.md
Abstract
Mycorrhizal symbioses are known to strongly influence plant performance, structure plant communities and shape ecosystem dynamics. Plant mycorrhizal traits, such as those characterizing mycorrhizal type (arbuscular (AM), ecto-, ericoid, or orchid mycorrhiza) and status (obligately (OM), facultatively (FM), or non-mycorrhizal) offer valuable insight into plant belowground functionality. Here, we compile available plant mycorrhizal trait information and global occurrence data (~100 million records) for 11,770 vascular plant species. Using a plant phylogenetic mega-tree and high-resolution climatic and edaphic data layers, we assess phylogenetic and environmental correlates of plant mycorrhizal traits. We find that plant mycorrhizal type is more phylogenetically conserved than plant mycorrhizal status, while environmental variables (both climatic and edaphic; notably soil texture) explain more variation in mycorrhizal status, especially FM. The previously underestimated role of environmental conditions has far-reaching implications for our understanding of ecosystem functioning under changing climatic and soil conditions.
README: Data from: Environmental modulation of plant mycorrhizal traits in the global flora
Authors: Yiming Meng1, John Davison1, John T. Clarke2,3,4,5, Martin Zobel1, Maret Gerz1, Mari Moora1, Maarja Öpik1, C. Guillermo Bueno1,6
Affiliations:
1Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia
2GeoBio-Center, Ludwig-Maximilians-Universität München, Munich, Germany
3Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany
4Department of Ecology and Biogeography, Nicolaus Copernicus University in Toruń, Toruń, Poland
5Department of Zoology, Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia
6Pyrenean Institute of Ecology, IPE-CSIC, Jaca, Huesca, Spain
Data Availability
We compiled four data sources: 1) plant species mycorrhizal trait data (See Appendix S2, including 14722 plant species); 2) plant species occurrence data1 ; 3) global climatic and soil environmental data 2 and 4) plant phylogenetic information ("7_Myco.plant.supertree.output_v3.rds").
The plant occurrence and environmental data were used to estimate plant species environmental associations. The environmental association data and phylogenetic information were then used to model variations in plant mycorrhizal trait expression.
Note
- The plant species occurrence records were obtained from the Global Biodiversity Information Facility (GBIF). Details on data querying and filtering are available in Appendix S1 of the associated paper.
- The original environment layers are available at the following sources (also shown in Appendix S1 Table 1):
- WorldClim - Climatic data
- CGIAR-CSI - Climatic data
- CHELSA - Climatic data
- Soilgrids-1km - Edaphic data
Globalchange - Edaphic data
Data layers were harmonized and resampled using ArcMap 10.6 to ensure consistent projection, extent, and grid alignment. A Raster Stack was then created from these aligned maps. A sensitivity test, detailed in Appendix S1 and in the file "6_Sensitivity_test.zip," showed that sampling from 20 grid cells yielded representative environmental estimates. This led us to exclude species with incomplete environmental data or occurrences in fewer than 20 cells, resulting in a final dataset of 11,770 species. Statistical measures like means and standard deviations were calculated for 53 continuous environmental variables. For the World Reference Base (WRB) soil map, we determined the percentage distribution of each species across different soil types. This aggregated information is available in the file "5_11770.csv."
Folder Structure
/2_Figure_2.zip
: Data and scripts for generating Figure 2./3_Figure_3.zip
: Data and scripts for generating Figure 3./4_Figure_4.zip
: Data and scripts for generating Figure 4./5_11770.csv
: Mycorrhizal traits, environmental variables, and taxonomy for 11,770 plant species./6_Sensitivity_test.zip
: Files for assessing sample size adequacy in estimating plant-environment correlations./7_Myco.plant.supertree.output_v3.rds
: R data file of a phylogenetic supertree for studied plants./8_Phylogenetic_signals.zip
: Measurement of phylogenetic signals of plant mycorrhizal trait./9_50km_46957grids.csv
: Global plant mycorrhizal traits mapped at a 50km x 50km scale, environmental variables (means and standard deviations) are also included./10_Phy_env_eigenvectors.R
: R script for computing eigenvectors related to phylogenetics and environmental data./11_RF_modelling.zip
: Random Forest models for analyzing variables related to plant mycorrhizal traits./12_Feature_importance.zip
: Output files detailing feature importance in Random Forest models./13_50kmGrid_traits.zip
: Plant mycorrhizal trait distribution for each studied species, mapped onto a 50km x 50km grid./14_stack50.tif
: Multi-layer raster file with aggregated environmental variables, aligned to a 50km x 50km grid.
File Descriptions
5. 11770.csv
This CSV file provides an exhaustive profile of 11,770 individual plant species at a high-resolution scale of 1x1 km. The dataset includes:
- Detailed environmental conditions specific to each plant species
- Plant taxonomic classifications
- Plant Mycorrhizal traits
The columns include (descriptions for environmental variables are also available in Appendix S1):
Species
: (String) Scientific name of the plant species.Obs
: (Integer) The number of observations recorded in the GBIF database.Cells
: (Integer) Number of 1km x 1km grid cells where the species was observed. Only species appearing in 20 or more cells are included.aridity_index
: (Float) Quantifies the dryness of the environment.available.water
: (Float, %) Available soil water capacity up to the wilting point.base.satu
: (Float, %) Percentage of base saturation in the soil.BDRICM_M_1km_ll
: (Float, cm) Depth to bedrock (R horizon) up to 200 cm.BDTICM_M_1km_ll
: (Float, cm) Absolute depth to bedrock.bulk.ds
: (Float, kg/m³) Bulk density of the fine earth.CaCO3
: (Float, % of weight, scale factor 0.01) Percentage of calcium carbonate in the soil.CaSO4
: (Float, % of weight, scale factor 0.01) Percentage of gypsum in the soil.CEC.soilgrids
: (Float, cmol/kg) Cation exchange capacity of the soil.CHELSA_bio10_01
toCHELSA_bio10_19
: (Float) Climate variables; temperature in ℃, scale factor 0.1 and precipitation in mm. See CHELSA for detailed description of each variable.clay
: (Float, %) Weight percentage of clay particles (<0.0002 mm) in the soil.coarse
: (Float, %) Volumetric percentage of coarse fragments (>2 mm) in the soil.EC
: (Float, dS/m, scale factor 0.01) Electrical conductivity of the soil.elevation
: (Float, m) Elevation above sea level.Evapotranspiration
: (Float, mm/year) Annual rate of evapotranspiration.ex.acidity
: (Float, cmol/kg, scale factor 0.01) Exchangeable acidity in soil.ex.Al
: (Float, cmol/kg, scale factor 0.01) Exchangeable aluminum in soil.ex.Ca
: (Float, cmol/kg, scale factor 0.01) Exchangeable calcium in soil.ex.K
: (Float, cmol/kg, scale factor 0.01) Exchangeable potassium in soil.ex.Mg
: (Float, cmol/kg, scale factor 0.01) Exchangeable magnesium in soil.ex.Na
: (Float, cmol/kg, scale factor 0.01) Exchangeable sodium in soil.pH.KCl
: (Float, scale factor 0.1) Soil pH measured in a KCl solution.sand
: (Float, %) Weight percentage of sand particles (0.05–2 mm) in the soil.silt
: (Float, %) Weight percentage of silt particles (0.0002–0.05 mm) in the soil.total.C
: (Float, % of weight, scale factor 0.01) Total carbon content in soil.total.N
: (Float, % of weight, scale factor 0.01) Total nitrogen content in soil.total.P
: (Float, % of weight, scale factor 0.0001) Total phosphorus content in soil.total.S
: (Float, % of weight, scale factor 0.01) Total sulfur content in soil.aswc1
toaswc3
: (Float, %) Available soil water capacity at different field capacities.occont
: (Float, ‰) Organic carbon content in soil.srad
: (Float, kJ m-2 day-1) Solar radiation exposure.vapr
: (Float, kPa) Water vapor pressure.wind
: (Float, m/s) Wind speed.
Columns ending with _std
indicate standard deviations of the measurements, representing environmental variability.
For soil types like Chernozems, Phaeozems, etc., these are categorical variables indicating the percentage of occurrences of each species in each soil type based on the World Reference Base (WRB) soil classification.
The dataset also includes taxonomic classifications and details on the plant's mycorrhizal associations:
Species_TPL
: (String) The accepted species name according to the Plant List (TPL).Species_acpt.syno
: (String) A possible synonym for the species name according to reliable resources.species_corr
: (String) The adopted species name used in this study. The space between the genus and species names has been replaced by an underscore ("_").species_corrNo_
: (String) The adopted species name used in this study, without any special characters.Genus
: (String) The taxonomic genus to which the plant species belongs.Family
: (String) The taxonomic family to which the plant species belongs.Mycorrhizal_type
: (String) Specifies the type of mycorrhizal association the species can form, such as AM (Arbuscular Mycorrhizal) or ECM (Ectomycorrhizal).Mycorrhizal_status
: (String) Specifies the nature of the mycorrhizal association, such as 'obligate' or 'facultative'.nr_types
: (Integer) Number of different types of mycorrhizal associations the species can form according to our database. For the last seven columns (FM, NM, OM, AM, ECM, ERM, ORM), these are binary variables generated from theMycorrhizal_type
andMycorrhizal_status
columns, which are used for Random Forest modeling.
6. Sensitivity test
inflection_example.R
: Demonstrates how to calculate the second derivative of standard deviation values with respect to sample size, using occurrences of Equisetum fluviatile and 7 randomly selected environmental variables. The environmental variables are characterized by their mean and standard deviation.inf.csv
: Contains the calculated inflection points for 12 randomly selected species, generated from the associated R script. <br> Each column represents a different species, while rows are designated for environmental variables which can be found in the "5. 11770.csv" file.other csv files
: Contains environmental associations for 12 randomly selected species. <br> The columns correspond to environmental variables, as detailed in the "5. 11770.csv" file.
8. Phylogenetic signals
deltaFunction.R
: implements the delta function for calculating phylogenetic signals of categorical variables. The function was developed by Borges et al. in 2019.Phylogenetic_signals.R
: Code to calculate phylogenetic signals. It uses the phylogenetic tree stored in7_Myco.plant.supertree.output_v3.rds
and takes mycorrhizal type, AM and ERM as examples.
9. 50km_46957grids
This CSV file shifts the perspective to a macroscopic view by profiling 46,957 grid cells, each at a 50x50 km scale. The dataset includes:
- Aggregated environmental conditions across these larger grid cells
- Proportions of each mycorrhizal trait in the grid cell
This dataset is constructed by aggregating plant occurrence data from the 1x1 km grids and upscaling environmental variables. Its primary aim is to visualize global patterns of mycorrhizal trait distribution, making it particularly suitable for A4 size printed journal papers. Environmental variables are consistent with those found in 5_11770.csv
.
Other columns include:
Grid_ID
: (String) A unique identifier for each 50x50 km grid cell, aiding in spatial indexing and analysis.x
: (Float, longitude) The longitude coordinate that represents the central point of the grid cell.y
: (Float, latitude) The latitude coordinate that represents the central point of the grid cell.N
: (Integer) The number of plant species recorded within that specific grid cell.WRB
: (String) The World Reference Base for Soil Resources classification code.WRB_CAT
: (String) The World Reference Base for Soil Resources classification category, providing information on the soil types present in the grid cell.na
: (Integer) Number of missing environmental values for each grid cell, which are all set to 0 in this data set, as missing environmental values have been excluded.AM
: (Float) The proportion of arbuscular mycorrhizal plants within the grid cell.ERM
: (Float) The proportion of ericoid mycorrhizal plants within the grid cell.ECM
: (Float) The proportion of ectomycorrhizal plants within the grid cell.ORM
: (Float) The proportion of orchid mycorrhizal plants within the grid cell.FM
: (Float) The proportion of facultative mycorrhizal plants within the grid cell.NM
: (Float) The proportion of non-mycorrhizal plants within the grid cell.OM
: (Float) The proportion of obligate mycorrhizal plants within the grid cell.
Note: Missing data (in the context of plant mycorrhizal traits that do not appear in a given grid) are represented as NA
.
11. RF modelling
csv files
: Eigenvectors generated from10_Phy_env_eigenvectors.R
, related to both phylogenetics and environmental data.ipynb files
: Jupyter Notebook examples for performing RandomForest modeling. One notebook focuses on binary data using Arbuscular Mycorrhiza (AM) as an example, and another focuses on categorical data representing different mycorrhizal statuses. Both notebooks provide the out-of-bag rate and feature importance.
12. Feature importance
Overview: Feature importance results generated from 11_RF_modelling
.
*.env_imp.csv
: Feature importance for raw environmental variables.*.env+pvr_imp.csv
: Feature importance for raw environmental variables, contextualized by plant phylogeny.
Table columns:
env_type
: Specifies the type of environment, which can either be soil or climate.data_type
: Indicates whether the data represents the mean or standard deviation values.cols
: Represents environmental variables, the details of which can be found in the "5. 11770.csv" file.imp
: Denotes the feature importance as determined by Random Forest models.group
: Categorized as follows:1
for climatic mean2
for climatic standard deviation3
for soil mean4
for soil standard deviation5
for soil order
13. 50kmGrid traits
Plant mycorrhizal type and status distribution for each studied species. Also includes the square ID for each 50km grid and the geometry coordinates for the center of each grid.
Figure 2
Phylogenetic trees were constructed using the 'ggtree' and 'ggtreeExtra' packages developed by Xu et al. (2021, 2022). More information about these packages can be found at https://github.com/YuLab-SMU/ggtree and https://github.com/YuLab-SMU/ggtreeExtra.
Node_entropy.RData
: Contains node entropy calculations derived from the script in8_Phylogenetic_signals.zip
.*_bar.R
: Scripts for generating phylogenetic trees with annotated bars in Figure 2, using data from7_Myco.plant.supertree.output_v3.rds
.Fig2_violinPlot.R
: Script for producing the violin plot in Figure 2, based on data inNode_entropy.RData
.
Figure 3
Visualization of Global Distribution Patterns of Mycorrhizal Traits
- Overview: The global distribution of plant mycorrhizal traits was visualized by aggregating plant occurrence data into 50 x 50 km grid cells (see "13_50kmGrid_traits.zip") and upscaling environmental variables (see "14_stack50.tif"). Please refer to "9_50km_46957grids.csv" for the resulting data, which contains 4,569,680 records across 46,957 grids.
- Methods: Preliminary maps were generated using Python's 'rasterio' package and further visualized in R using 'ggplot2'.
Sub-files in "Figure_3.zip/map":
50base_map.tif
: World raster at 50 km scale, each cell includes a unique ID for reference.50km_map.ipynb
: Plotting geometry data from/9_50km_46957grids.csv
as rasters for each plant mycorrhizal trait.*_new.tiff
: Output rasters generated from50km_map.ipynb
.clipped_*.tif
: These are outputs where*_new.tiff
files were edited in ArcMap 10.6 to remove occurrences in oceanic regions.countries.*
: Shapefile containing coastal lines.Fig3_map.R
: Generates maps for Figure 3 using theclipped_*.tif
files.
Furthermore, we presented the relationship between the share of species with specific mycorrhizal traits and edaphic and climatic gradients.
Sub-files in "Figure_3.zip/pca":
csv files
: Includes 25 climatic and 28 soil variables extracted from9_50km_46957grids.csv
. Detailed descriptions of these variables can be found in "9_50km_46957grids.csv
" and "5_11770.csv
".pcaPlot.R
: Script used to generate PCA plots in Figure 3. The plots show the share of species with specific mycorrhizal traits in relation to the first two principal components of edaphic and climatic factors.*.tiff*
: Output PCA plots generated bypcaPlot.R
.
Figure 4
- Overview: Showcases the feature importance of the top 20 predictors for each plant mycorrhizal trait. The analysis is based on RF Model 4 (see Figure 1), which excluded plant phylogenetic predictors. Data used for these figures are stored in
12_Feature_importance.zip
. Fig4_bar.R
: Script genraing the bar plots in Figure 4, in the absence of plant phylogenetic vectors. Plot in the present of plant phylogeny can be found in Appendix S5.*.tiff*
: Output bar plots generated byFig4_bar.R
.
References
Borges, Rui, João Paulo Machado, Cidália Gomes, Ana Paula Rocha, and Agostinho Antunes. 2019. “Measuring Phylogenetic Signal between Categorical Traits and Phylogenies.” Bioinformatics 35 (11): 1862–69.
Xu, S., Dai, Z., Guo, P., Fu, X., Liu, S., Zhou, L., Tang, W., Feng, T., Chen, M., Zhan, L., Wu, T., Hu, E., Jiang, Y., Bo, X., & Yu, G. (2021). ggtreeExtra: Compact Visualization of Richly Annotated Phylogenetic Data. Molecular Biology and Evolution, 38(9), 4039–4042.
Xu, S., Li, L., Luo, X., Chen, M., Tang, W., Zhan, L., Dai, Z., Lam, T. T., Guan, Y., & Yu, G. (2022). Ggtree : A serialized data object for visualization of a phylogenetic tree and annotation data. iMeta, 1(4).
Methods
Data collection In order to identify phylogenetic and environmental correlates of plant mycorrhizal traits, we compiled four data sources: 1) plant species mycorrhizal trait data; 2) plant species occurrence data; 3) global climatic and soil environmental data and 4) plant phylogenetic information. The plant occurrence and environmental data were used to estimate plant species' environmental associations. The environmental association data and phylogenetic information were then used to model variations in plant mycorrhizal trait expression (Figure 1).
First, plant mycorrhizal trait data were obtained from the most up-to-date literature available, including data published by Harley & Harley (1987, 1990), Wang & Qiu (2006), Hempel et al. (2013), Bueno et al. (2017), Gerz et al. (2018) and Soudzilovskaia et al. (2020). We distinguished four plant mycorrhizal types (arbuscular (AM), ecto- (ECM), orchid (ORM), and ericoid (ERM) mycorrhiza, as defined by Smith & Read (2008)) and three plant mycorrhizal statuses (obligately (OM), facultatively (FM) and non-mycorrhizal (NM)). We compiled plant mycorrhizal trait data for 14,722 taxa at the species level (Appendix S2). Second, plant species occurrence records were retrieved from the Global Biodiversity Information Facility (GBIF; www.gbif.org) for 13,479 species that were present in both the standardised GBIF species list and the mycorrhizal trait species list. Third, plant occurrence data of 13,479 species were intersected with a 30 arc-seconds (approximately 1 km) global grid, with the presence or absence of each species in each cell recorded. A sensitivity test indicated that sampling 20 grid-cell records produced environmental parameter estimates that were representative of wider distribution areas (Appendix S1); retaining species recorded in ≥ 20 cells left 62,540,387 grid-cell level records for 11,770 species (Appendix S3). Environmental associations were approximated by intersecting the distribution data for each species with a raster stack of 54 environmental data layers (Table S1). Fourth, a phylogenetic tree containing the 11,770 species in our dataset was compiled, and phylogenetic signal in plant mycorrhizal traits was examined using the δ statistic (Borges et al., 2019). See Appendix S1 for details of datasets and data filtering. 710 dual mycorrhizal plant species (AM + ECM) were distinguished (Appendix S2), of which 665 were matched with geographic location information in GBIF. Except where stated otherwise, these species were grouped with ECM plant species in further analyses, reflecting ongoing controversy concerning the definition of dual mycorrhizal plant species (Teste et al., 2020; Brundrett, 2021a) and the fact that the niches of dual mycorrhizal plants most closely resemble those of ECM plants (Gerz et al., 2018).
References:
Borges, R., Machado, J. P., Gomes, C., Rocha, A. P., & Antunes, A. (2019). Measuring phylogenetic signal between categorical traits and phylogenies. Bioinformatics , 35(11), 1862–1869.
Brundrett, M. C. (2021). Auditing data resolves systemic errors in databases and confirms mycorrhizal trait consistency for most genera and families of flowering plants. Mycorrhiza, 31(6), 671–683.
Bueno, C. G., Moora, M., Gerz, M., Davison, J., Öpik, M., Pärtel, M., Helm, A., Ronk, A., Kühn, I., & Zobel, M. (2017). Plant mycorrhizal status, but not type, shifts with latitude and elevation in Europe. Global Ecology and Biogeography: A Journal of Macroecology, 26(6), 690–699.
Gerz, M., Guillermo Bueno, C., Ozinga, W. A., Zobel, M., & Moora, M. (2018). Niche differentiation and expansion of plant species are associated with mycorrhizal symbiosis. The Journal of Ecology, 106(1), 254–264.
Harley, J. L., & Harley, E. L. (1987). A check-list of mycorrhiza in the British flora. The New Phytologist, 105(2), 1–102.
Harley, J. L., & Harley, E. L. (1990). A check-list of mycorrhiza in the British flora-second addenda and errata. The New Phytologist, 115(4), 699–711.
Hempel, S., Götzenberger, L., Kühn, I., Michalski, S. G., Rillig, M. C., Zobel, M., & Moora, M. (2013). Mycorrhizas in the Central European flora: relationships with plant life history traits and ecology. Ecology, 94(6), 1389–1399.
Smith, S. E., & Read, D. J. (2008). Mycorrhizal Symbiosis. Academic Press.
Soudzilovskaia, N. A., Vaessen, S., Barcelo, M., He, J., Rahimlou, S., Abarenkov, K., Brundrett, M. C., Gomes, S. I. F., Merckx, V., & Tedersoo, L. (2020). FungalRoot: global online database of plant mycorrhizal associations. The New Phytologist, 227(3), 955–966.
Teste, F. P., Jones, M. D., & Dickie, I. A. (2020). Dual-mycorrhizal plants: their ecology and relevance. The New Phytologist, 225(5), 1835–1851.
Wang, B., & Qiu, Y.-L. (2006). Phylogenetic distribution and evolution of mycorrhizas in land plants. Mycorrhiza, 16(5), 299–363.