Skip to main content

Global biogeography of fungal and bacterial biomass carbon in topsoil

Cite this dataset

He, Liyuan et al. (2020). Global biogeography of fungal and bacterial biomass carbon in topsoil [Dataset]. Dryad.


Bacteria and fungi, representing two major soil microorganism groups, play an important role in global nutrient biogeochemistry. Biogeographic patterns of bacterial and fungal biomass are of fundamental importance for mechanistically understanding nutrient cycling. We synthesized 1323 data points of phospholipid fatty acid-derived fungal biomass C (FBC), bacterial biomass C (BBC), and fungi:bacteria (F:B) ratio in topsoil, spanning 11 major biomes. The FBC, BBC, and F:B ratio display clear biogeographic patterns along latitude and environmental gradients including mean annual temperature, mean annual precipitation, net primary productivity, root C density, soil temperature, soil moisture, and edaphic factors. At the biome level, tundra has the highest FBC and BBC densities at 3684 (95% confidence interval: 1678~8084) mg kg-1 and 428 (237~774) mg kg-1, respectively; desert has the lowest FBC and BBC densities at 16.92 (14.4~19.89) mg kg-1 and 6.83 (6.1~7.65) mg kg-1, respectively. The F:B ratio varies dramatically, ranging from 1.8 (1.6~2.1) in savanna to 8.6 (6.7~11.0) in tundra. An empirical model was developed for the F:B ratio and it is combined with a global dataset of soil microbial biomass C to produce global maps for FBC and BBC in 0-30 cm topsoil. Across the globe, the highest FBC is found in boreal forest and tundra while the highest BBC is in boreal forest and tropical/subtropical forest, the lowest FBC and BBC are in shrub and desert. Global stocks of living microbial biomass C were estimated to be 12.6 (6.6~16.4) Pg C for FBC and 4.3 (0.5~10.3) Pg C for BBC in topsoil. These findings advance our understanding of the global distribution of fungal and bacterial biomass, which facilitates the incorporation of fungi and bacteria into Earth system models. The global maps of bacterial and fungal biomass serve as a benchmark for validating microbial models in simulating the global C cycle under a changing climate.


2.1 Data Compilation

We used a combination of keywords, “fung*” or “bacteria*”, “ratio”, and “terrestrial” or “soil”, to search peer-reviewed papers in Google Scholar. The papers were selected via the following criteria: 1) either concurrent fungal biomass and bacterial biomass or F:B ratio was clearly reported; 2) the data were extractable from tables (assessing the text) or figures (using Engauge Digitizer Version 10.7); 3) the study sites were not affected by disturbances such as fire, mining, and heavy metal contamination; and 4) the reported data cover 0-30 cm topsoil. Geographical information of the sampling sites was recorded and used to locate the sites on the global map (Fig. 1). We also collected any available data on soil pH, mean annual precipitation (MAP), mean annual temperature (MAT), SOC, total nitrogen (TN) concentration, and soil texture, and then plotted these variables against the extracted data from global datasets to test the consistency (Fig. S1).

We recorded fungal and bacterial biomass C measured using methods such as phospholipid fatty acid (PLFA), direct microscopy (DM), colony forming units (CFU), substrate-induced respiration (SIR), and glucosamine and muramic acid (GMA) from peer-reviewed papers. To examine the potential biases in the measurement of fungal and bacterial biomass, we did a comparison among those methods (Table 1, Table S1). To compare FBC and BBC measured using different methods, we used conversion factors for PLFA (Frostegård and Bååth, 1996; Klamer and Bååth, 2004), SIR (Beare et al., 1990), CFU (Aon et al., 2001), DM (Birkhofer et al., 2008), and GMA (Jost et al., 2011) reported in previous studies. Across biomes, FBC, BBC, and the F:B ratio generally followed a similar pattern among different methods. However, large variations were found in measured FBC and BBC among different methods. Specifically, compared with PLFA, SIR, and GMA, CFU reported dominant fungi over bacteria, while DM estimated a higher dominance of bacteria relative to fungi, suggesting that DM may underestimate FBC while CFU may overestimate FBC. Meanwhile, we found overall higher FBC and BBC measured using GMA, which was largely distinct from the measurements using other methods. Using data generated from multiple methods in one analysis might be problematic. Therefore, we used PLFA data for subsequent analyses. This selection was due to two reasons: 1) the PLFA was the most widely used approach, with the PLFA-derived FBC and BBC measurements accounting for 73% of the whole dataset; 2) the PLFA method has been evaluated and proved to be the most appropriate approach for estimating FBC and BBC simultaneously (Waring et al., 2013).

The final database included the fungal and bacterial biomass data measured using PLFA from publications spanning from the late 1960s to 2018. Collectively, 1323 data points in 11 biomes (i.e., boreal forest, temperate forest, tropical/subtropical forest, grassland, shrub, savanna, tundra, desert, natural wetlands, cropland, and pasture) across the globe were included in the database (Fig. 1). Forest, grassland, and cropland contributed approximately 39%, 22%, and 19% of the dataset, respectively, whereas all other biomes combined accounted for 20% of the dataset. A majority of the field sites are located in North America, Europe, and Asia, and a relatively small number of observations are in South America, Africa, North Asia, Australia, and Antarctica. For data points without coordinate information being reported, we searched the geographical coordinates based on the location of the study site, city, state, and country. Then, the geographical information was used for locating the sampling points on the global map to extract climate, edaphic properties, plant productivity, and soil microclimate long-term data from global datasets.

2.2 Climate, Plant, and Soil Data

MAT and MAP with the spatial resolution of 30 seconds during 1970-2000 were obtained from the WorldClim database version 2 ( In addition, monthly mean SM and soil temperature (ST) during 1979-2014 were obtained from the NCEP/DOE AMIP-II Reanalysis ( The global vegetation distribution data were obtained from a spatial map of 11 major biomes: boreal forest, temperate forest, tropical/subtropical forest, mixed forest, grassland, shrub, tundra, desert, natural wetlands, cropland, and pasture, which have been used in our previous publications (Xu et al., 2013; Xu et al., 2017). We also obtained the data of soil pH, sand, silt, clay, and SOC from the Harmonized World Soil Database (HWSD, at a 0.5° × 0.5° resolution grid. Soil bulk density and TN were extracted from the IGBP-DIS dataset (IGBP,, at a spatial resolution of 0.5′ × 0.5′. Since TN in IGBP-DIS are for the 0–100 cm soil profile as a whole, we used the factor calculated from the fraction of SOC in the top 0-30 cm in the HWSD database. Since SOC and soil TN exhibit large spatial heterogeneities, and the variation in fine-scale variation in edaphic properties are underrepresented in global datasets, we examined the relationships of FBC, BBC, and F:B ratio with SOC, TN, and C:N ratio with the data directly extracted from literature. Due to the poor correlation between bulk density extracted from HWSD and the reported bulk density values in the literature, we used the same soil bulk density values for the entire top 100 cm soil profile from IGBP, assuming no difference in bulk density between top 0-30 cm and 30-100 cm soil profiles. Root C density (Croot) data were extracted from global dataset of 0.5 degree resolution based on observation data (Ruesch and Gibbs, 2008; Song et al, 2017). Annual net primary productivity (NPP) for the period of 2000-2015 was obtained from the MODIS gridded dataset with a spatial resolution of 30 seconds ( These global datasets of varied spatial resolutions were interpolated to 0.5 degree using “bilinear” method based on the GDAL library (GDAL Development Team, 2018) for generating the global maps of FBC, BBC, and F:B ratio.

2.3 Model Selection and Validation

For FBC, BBC, and the F:B ratio, we developed generalized linear models considering the interactive roles of climate (MAP and MAT), soil microclimate (ST and SM), plant (NPP and Croot), and edaphic properties (clay, sand, soil pH, bulk density, SOC, and TN) to tease apart the controlling factors on fungal and bacterial distribution. Based on the generalized linear model of climate, plant, edaphic properties, and soil microclimate for FBC, BBC, and the F:B ratio, over 70% of the variation in FBC, BBC, and the F:B ratio was explained by the generalized linear model, and FBC and BBC were better explained than the F:B ratio (Fig. 2).

Considering the higher proportion of missing data in FBC (14.8%) and BBC (16.3%) relative to the F:B ratio (1.9%), we built an empirical model for the F:B ratio by randomly splitting the dataset with 75% of the data used in training the model. With the generalized linear model of the F:B ratio, we performed the principal component analysis to estimate the number of the important components in explaining the variations in the F:B ratio. Based on the variations explained by each component and the cumulative variation of components, we selected 31 of the most important factors, with 33.0% of the variation in the F:B ratio explained by the empirical model (Fig. S7; Table S2). The selected empirical model had the formula: log10 (F:B ratio) = 0.6789 - 0.03402 * MAT - 0.000058 * MAP + 0.003772 * ST + 1.542 * SM - 0.00099 * NPP + 0.01553 * Croot + 0.1226 * bulk density + 0.05991 * soil pH - 0.03631 * clay - 0.0045 * sand + 0.002878 * SOC - 0.01607 * TN + 0.000177 * MAT * ST - 0.03955 * MAT * SM - 0.000015 * MAP * ST - 0.000335 * MAP * SM + 0.000005 * MAT * NPP - 0.001615 * MAT * Croot + 0.000001 * MAP * NPP + 0.000007 * MAP * Croot + 0.02201 * MAT * bulk density - 0.003794 * MAT * soil pH + 0.002188 * MAT * clay + 0.000137 * MAT * sand - 0.000061 * MAT * SOC + 0.00513 * MAT * TN - 0.000029 * MAP * soil pH + 0.000001 * MAP * clay + 0.000003 * MAP * sand - 0.000001 * MAP * SOC - 0.000043 * MAP * TN

After the model was developed, we used 25% of the data that were not used in model development to validate the model, and we found a high consistency between model prediction and observed data (Fig. S8a). We then investigated the F:B ratio model performance by comparing the model simulated values and observed data in each biome (Fig. S9). We found good consistency between the simulated and observed log-transformed F:B ratio in all biomes except desert. Given the much lower BBC and FBC in deserts, this inconsistency does not introduce a large bias to the large-scale estimation of BBC and FBC. Additionally, we found some overestimation of the F:B ratio in croplands and pastures, indicating large uncertainties in managed systems.

2.4 Mapping Global Soil Bacterial and Fungal Biomass Carbon

We compared the soil microbial biomass C reported in Xu et al. (2013) and the sum of FBC and BBC in this study and found a strong agreement in these estimates (Fig. S8b; R2=0.91). This indicated that the sum of FBC and BBC constituted a constant proportion of microbial biomass, which provided a feasible way to estimate FBC and BBC. Based on the microbial biomass C dataset in Xu et al. (2013) and the global map of the F:B ratio generated in this study, we produced the global maps and estimated global storage of FBC and BBC. The auxiliary data used included global vegetation distribution (Xu et al., 2013) and global land area database supplied by surface data map generated by the Community Land Model 4.0 (

2.5 Uncertainty Analysis

To estimate the parameter-induced uncertainties in fungal and bacterial biomass distribution and storage, we used an improved Latin Hypercube Sampling (LHS) approach to estimate variation in F:B ratio. The LHS approach is able to randomly produce an ensemble of parameter combinations with a high efficiency. This approach has been widely used to estimate uncertainties in model output (Haefner, 2005; Xu, 2010; Xu et al., 2014). Specifically, we assumed that all parameters followed a normal distribution. Then, we used LHS to randomly select an ensemble of 3000 parameter sets using the function of “improvedLHS” in the R package “lhs” (Carnell and Carnell, 2019) (Table S2). Finally, we calculated the 95% confidence interval of fungal and bacterial biomass C density and storage for reporting (Table 2).

2.6 Statistical Analysis

We first tested the normality of data distribution using the function of “shapiro.test” in the R package “stats” (R Core Team, 2013). We found that FBC, BBC, and F:B ratio in our dataset did not follow a normal distribution. Therefore, these variables were log-transformed for subsequent statistical analysis. The mean and 95% confidence boundaries of FBC, BBC, and F:B ratio were transformed back to the original values for reporting. We constructed a generalized linear model using the function of “glm” in the R package “stats” (R Core Team, 2013) to investigate relationships between FBC, BBC, and the F:B ratio and long-term climate (MAP and MAT), soil microclimate (ST and SM), plant (NPP and Croot), and edaphic properties (clay, sand, soil pH, bulk density, SOC, and TN). We used Akaike information criterion (AIC) as a model selection criterion. Before conducting the generalized linear model, we tested the multicollinearity for the variables within and among each variable group, i.e., climate, soil microclimate, edaphic properties, and plant, and we found no significant multicollinearity (VIF < 5). All statistical analyses were performed and relevant figures were plotted using “agricolae” (de Mendiburu and de Mendiburu, 2019), “multcomp” (Hothorn et al., 2016), “soiltexture” (Moeys, 2018), “VennDiagram” (Chen and Boutros, 2011), “ggplot2” (Wickham et al., 2016), and “basicTrendline” (Mei et al., 2018)packages in R version 3.5.3 for Mac OS X ( Fig. 1 and Fig. 3 were produced with NCAR Command Language (version 6.3.0) and ArcGIS (version 10.5), respectively.


San Diego State University

United States Department of Energy, Award: Office of Science NGEE Arctic project

United States Department of Energy, Award: Office of Science SPRUCE project

National Natural Science Foundation of China, Award: 41125001

Dutch Research Council, Award: VIDI grant 016.161.318

Northeast Institute of Geography and Agroecology

CSU Program for Education & Research in Biotechnology

VIDI, Award: 016161318

Dutch Research Council