Data from: Large, climate-sensitive soil carbon stocks mapped with pedology-informed machine learning in the North Pacific coastal temperate rainforest
Data files
Nov 19, 2018 version files 60.38 MB
-
FluxProject_SOCmap.7z
58.45 MB
-
McNicoletal-2018-NPCTR-Pedon-SOC-Database.xlsx
1.94 MB
Oct 10, 2024 version files 60.14 MB
Abstract
Accurate soil organic carbon (SOC) maps are needed to predict the terrestrial SOC feedback to climate change, one of the largest remaining uncertainties in Earth system modeling. Over the last decade, global scale models have produced varied predictions of the size and distribution of SOC stocks, ranging from 1,000 to > 3,000 Pg of C within the top 1 m. Regional assessments may help validate or improve global maps because they can examine landscape controls on SOC stocks and offer a tractable means to retain regionally-specific information, such as soil taxonomy, during database creation and modeling. We compile a new transboundary SOC stock database for coastal watersheds of the North Pacific coastal temperate rainforest, using soil classification data to guide gap-filling and machine learning approaches used to explore spatial controls on SOC and predict regional stocks. Precipitation and topographic attributes controlling soil wetness were found to be the dominant controls of SOC, underscoring the dependence of C accumulation on high soil moisture. The random forest model predicted stocks of 4.5 Pg C (to 1 m) for the study region, 22% of which was stored in organic soil layers. Calculated stocks of 228 ± 111 Mg C ha-1 fell within ranges of several past regional studies and indicate 11-33 Pg C may be stored across temperate rainforest soils globally. Predictions were compared very favorably to regionalized estimates from two spatially explicit global products (Pearson's correlation: ρ = 0.73 vs. 0.34). Notably, SoilGrids250m was an outlier for estimates of total SOC, predicting 4-fold higher stocks (18 Pg C) and indicating bias in this global product for the soils of the temperate rainforest. In sum, our study demonstrates that CTR ecosystems represent a moisture-dependent hotspot for SOC storage at mid-latitudes.
README: North Pacific Coastal Temperate Rainforest (NPCTR) Pedon and Soil Carbon Database
Access this dataset on Dryad: https://doi.org/10.5061/dryad.5jf6j1r
This database compiles pedon data and soil organic carbon stock data (ca. 1300 soil profile descriptions) from various sources across coastal British Columbia and southeast Alaska.
Description of the data and file structure
The file entitled McNicoletal-2024-NPCTR-Pedon-SOC-Database.xlsx contains the data for all of the soil pedons and corresponding soil organic carbon stock data. The file has four tables: a master table with all the data, a pedon table with pedon-specific data, a horizon table with horizon-specific data, and a summary table.
McNicoletal-2024-NPCTR-Pedon-SOC-Database.xlsx contains the following columns:
- source: source reference (see Source References tab) for the pedon data. In most cases, these are published database data (e.g., Shaw et al. 2018), or published manuscripts, but include one thesis and unpublished data from the Hakai Institute.
- pedon_id: this is the identifier extracted from the source reference. In many cases, these are named pedon locations, but sometimes they are pedon codes (e.g. NRCS data) or numeric identifiers (e.g. Shaw et al. 2018). ,* order: this is the soil order using the fullest taxonomic classification available in the source reference. It has not yet been simplified for aggregation, down to the singular order designations (e.g., HISTOSOL).
- lat: the most accurate latitude value reported for the pedon location in decimal degrees.
- lon: the most accurate longitude value reported for the pedon location in decimal degrees.
- latlon_q: the quality flag for the LAT and LON values based upon criteria described in the manuscript (doi: 10.1088/1748-9326/aaed52). Generally, too few decimal places (low precision), obvious inaccuracy, or pre-gps sampling received LOW.
- horizon: the detailed horizon designation from the source reference with as many suffixes (e.g., Bh…) as was reported.
- horizon_number: indicates the order of horizons within the master table. A horizon can be uniquely identified using its pedon id and horizon number.
- horizon_type: organic or mineral horizon.
- depth2: the depth of the top of the soil horizon in centimeters (cm).
- depth1: the depth of the bottom of the soil horizon in centimeters (cm).
- depth: the depth of the soil horizon (DEPTH2-DEPTH1) in centimeters (cm).
- bulk_density: the measured or estimated/assigned (in beige) dry bulk density value in grams per cubic centimeter (g cm-3). The Supplementary Information provides a breakdown of steps to estimate bulk density. Most values are taken from Shaw et al. 2015 (Table 8).
- bd_method: whether the assigned value was measured or estimated. 0 indicates that the value is measured. 1 indicates that the value is estimated using a lookup table. This procedure was replicated to fill data gaps in multiple datasets. (More information in manuscript supplement).
- cf: the mineral coarse fragment content in percent (% volume). Generally, these values are reported, but where filled, they are highlighted and the Supplementary Information explains how.
- cf_method: whether the assigned value was measured or estimated. 0 indicates that the value is measured. 1 indicates that the value is estimated using a lookup table or other methods. This procedure was replicated to fill data gaps in multiple datasets. (More information in manuscript supplement). 2 indicates that the cf was originally null and 0 was assumed for calculation purposes.
- cconc: the reported or estimated horizon carbon concentration in percent (% mass). Where estimated, these values are highlighted in color, and source reference-specific methods are described in Supplementary Info.
- cconc_method: whether the assigned value was measured or estimated. 0 indicates that the value is measured. 1 indicates that the value is estimated using a lookup table or other methods. This procedure was replicated to fill data gaps in multiple datasets. 2 indicates that the cconc was estimated using linear regression. (More information in manuscript supplement).
- mineral_d: the deepest depth of the subsurface mineral horizons in centimeters (cm) (maximum value 100 cm).
- ff_d: the total depth of forest floor organic horizons or Histosol depth in centimeters (cm) (no maximum value).
- total_d: the total depth of soil accounted for in SOC stock estimate in centimeters (cm) (Mineral_D + FF_D).
- ccontent: calculated carbon content in grams of carbon per square meter (gC m-2) for the horizon.
- total_c: summed carbon content across all reported horizons in grams of carbon per square meter (gC m-2).
- ccontent_1m: calculated carbon content in grams of carbon per square meter (gC m-2) for the horizon. Horizons below 100 cm in the subsurface mineral soil are assigned zero, while horizons that traverse this threshold are reduced proportionally by the fraction of the horizon below it.
- total_c_1m: summed carbon content across all reported horizons down to 1 m in the subsurface mineral soils and 1 m in histosols in megagrams of carbon per hectare (Mg C ha-1).
- pedon_start: a boolean value which, if true, indicates that the row contains pedon-specific data and is the master row for that pedon.
The value "NA" corresponds to any missing information in columns of type object. For columns that are float64 or int, any empty cells represent missing information.
Soil Organic Carbon Stock Map
This raster [.tif] is the predicted soil organic carbon for the North Pacific coastal temperate rainforest. Content is displayed in megagrams of carbon per hectare (Mg ha-1) to 1 m in mineral soil, plus overlying organic horizons. Map values are the output of a random forest machine learning algorithm trained on pedon data from within British Columbia and southeast Alaska only, therefore confidence is low for predictions south of the US-Canada border and predictions in that region have not been validated. Lakes, glaciers, and ice fields have also not been masked from the map. More information on the map can be found in the associated manuscript.
FluxProject_SOCmap.7z
N Pacific coastal temperate rainforest pedon and soil carbon database
Version changes
10-oct-2024: The original database was updated and cleaned using Python Pandas to create a standardized database that combined all data sources into one. Along with all of the original data characteristics, the database now denotes how missing data was gap-filled and includes other added columns to create a more user-friendly experience. The database includes four tables: a master table, a pedon-specific table, a horizon-specific table, and a summary table. References, acknowledgments, and field descriptors can be found within the McNicoletal-2024-NPCTR-Pedon-SOC-Database.xlsx and README.md file. The original data and the script used to clean the data can be found on GitHub (see below).
Sharing/Access information
Links to other publicly accessible locations of the data.
Raw and cleaned data and code can be found on GitHub:
Sources from which the data was derived can be found in McNicoletal-2024-NPCTR-Pedon-SOC-Database.xlsx and the primary article:
- Primary article: https://doi.org/10.1088/1748-9326/aaed52
Methods
Transboundary SOC Database
We compiled a transboundary database of > 1300 soil profile descriptions (pedons) across SEAK and BC from published and archive data sources. For each pedon, we calculated SOC stocks for the top 1 m of mineral soil plus surface organic horizons using data harmonization and gap-filling procedures that are detailed in the supplementary information (supplementary tables 1–5). In brief, US soil classification was converted to Canadian where necessary, and gaps were filled with published values or modeled estimates grouped by soil class, horizon, and lithology. In contrast to some other regional and global C assessments, this approach avoided the use of generalized empirical relationships between soil properties and missing variables, such as between soil C and soil bulk density, or soil C and depth.
Environmental covariates
Environmental covariates were selected (supplementary table 6) to predict SOC stock due to their relationship with soil-forming factors (climate, organisms, relief, parent material, and time; Jenny 1994). Covariate data were extracted from the rasters at the pedon coordinates and appended to the final SOC stocks (in supplementary material) to use in all further analyses. Further details of the 12 selected environmental covariates along with justification for inclusion and pre-processing steps are listed in supplementary table 6. Briefly, only high-quality and spatially continuous data products were used. Curating covariates based on knowledge of regional soil development facilitates clearer interpretation and reduces the risk of autocorrelation between variables.
Random forest model
A random forest model was trained to predict stocks of SOC across the NPCTR in R (v.3.4; R Core Team 2018 (www.R-project.org)) using the R-package randomForest (4.6; Liaw and Wiener 2002). Random forests grow a large number of regression trees (Breiman et al 1984) from different random subsets of training data and predictor variables, thereby reducing variance relative to single trees, and greatly reducing the risk of over-fitting model predictions and non-optimal solutions—though at the cost of interpretability (Breiman 2001). The transboundary database SOC stocks and associated covariates were first split into training (80%) and testing (20%) data and the model was parameterized to grow 5000 trees. For each tree, a subsample equivalent to ¼ of the total sample size was utilized (with replacement). Node size was set at 4 to minimize the out-of-bag error based on preliminary testing. Model performance was measured from goodness-of-fit, distributions of residuals, and predictions of test SOC stocks. Confidence intervals were computed using an infinitesimal jack-knife procedure (Wager et al 2013). Predictions were made across the NPCTR study extent using an R-package raster (v2.6; Hijmans 2017) which produced a SOC map at 90.5 m resolution. All lakes >10 ha were clipped from the final map (HydroLakes, Messager et al 2016), and the glacier area was clipped using the Randolph Glacier Inventory 5.0 (GLIMS, Raup et al 2007) database. Final SOC stocks were adjusted for topography by scaling the SOC map with actual land surface area calculated from cell slope values. The random forest model was re-run for the three gap-filling sensitivity analyses. Soil organic carbon maps were exported as .tif files.