Satellite-derived trait data slightly improves tropical forest biomass, NPP, and GPP predictions
Data files
Mar 06, 2024 version files 53.99 MB
-
biomass_trait_GEDI.xlsx
-
CombspecDBH.mat
-
Finalcode_GEDIbiomass_Doughty2024.m
-
processGEDIdata.R
-
README.md
-
soilclimdata.mat
-
Tamtreeheight.mat
-
traitcompare.mat
-
traitgedidat.mat
Abstract
Improving tropical forest biomass predictions can accurately value tropical forests for their ecosystem services. Recently, the Global Ecosystem Dynamics Investigation (GEDI) lidar was activated on the international space station (ISS) to improve biomass predictions by providing detailed 3D forest structure and height data. However, there is still debate on how best to predict tropical forest biomass using GEDI data. Here we compare GEDI predicted biomass to 2,102 tropical forest biomass plots and find that adding a remotely sensed (RS) trait map of LMA (Leaf Mass per Area) significantly (P<0.001) improves field biomass predictions, but by only a small amount (r2=0.01). However, it may also help reduce the bias of the residuals because, for instance, there was a negative relationship between both LMA (r2 of 0.34) and %P (r2=0.31) and residuals. This improvement in predictability corresponds with measurements from 523 individual trees where LMA predicts Diameter at Breast height (DBH) (the critical measurement underlying plot biomass) with an r2=0.04, and spectroscopy (400-1075 nm) predicts DBH with an r2=0.01. Adding environmental datasets may offer further improvements and max temperature (Tmax) predicts Amazonian biomass residuals with an r2 of 0.76 (N=66). Finally, for a network of net primary production (NPP) and gross primary production (GPP) plots (N=21), RS traits are better at predicting fluxes than structure variables like tree height or Height Of Median Energy (HOME). Overall, trait maps, especially future improved ones produced by surface biology geology (SBG), may improve biomass and carbon flux predictions by a small but significant amount.
README: Satellite-derived trait data slightly improves tropical forest biomass, NPP, and GPP predictions
https://doi.org/10.5061/dryad.ttdz08m5n
The dataset contains leaf trait and spectral data to create Figures 1 and 2. It contains plot biomass data and satellite-derived leaf trait and structure data to create Figures 3-6. It contains plot NPP, GPP, and satellite-derived leaf trait and structure data to create Figures 7-8.
Description of the data and file structure, including the associated Code/Software
The Matlab code Finalcode_GEDIbiomass_Doughty2024.m contains all the code and data to create all the figures in the paper. The code has several sections that can be run independently.
The first section starting on line 1 uses the dataset traitcompare.mat to create Figure 1. This dataset contains two tables of leaf trait data called carnegiechem and merged. Units and column names are contained within the table.
The second section starting on line 62 uses the dataset traitgedidat.mat to create Figures 3-5 and Figures S1 and S2. This dataset contains a table called agball with plot biomass and coordinates. It also has the trait and GEDI data for these plots in nested structures. Units and descriptions are given in the code.
The third section starting on line 443 uses the datasets traitgedidat.mat and soilclimdata.mat to create Figure 6. The dataset traitgedidat.mat is the same as described above and soilclimdata.mat contains 0.1 by 0.1 degree gridded data for climate variables like Tmax (C) or VPD (Pa) or soil chemistry like CEC. Units and description are given in the code.
The forth section starting on line 536 uses the dataset Tamtreeheight.mat to create Figures 7 and 8. The dataset gedivsplot.mat contains table data with plot data, and nearby trait and GEDI data for several GEM plots. Units and description are given in the code and the tables.
The fifth section starting on line 791 uses the dataset CombspecDBH.mat to create Figure 2. The dataset has variables specallz which is the leaf spectral data from 350-1075 nm for each leaf and dbhz1 with is the corresponding tree dbh (cm). It also has LMAz which is the LMA data with datz as the corresponding spectral data.
To estimate spatial autocorrelation and the best model by AIC to create Table 1 and Figures S3 and 4, we used the R code processgedidata.r and the dataset biomass_trait_GEDI.xlsx. This dataset contains a table with latitude, longitude, field biomass and remote sensed biomass (Mg Ha-1), and traits LMA (g m2), Phosphorus (%), tree height (m), HOME (m) and % one peak (unitless).
Sharing/Access information
Original GEDI data are available from the USGS.
Methods
Field leaf trait and spectroscopy data – We used leaf trait and spectral data from an extensive field campaign along an elevation gradient (from 3500 m to 220 m elevation) in the Peruvian Amazon where leaf traits for 60-80% of basal area of trees >10cm DBH were measured within a well-studied 1 ha plot network from April – November 2013 (Enquist et al., 2017). In each one ha plot (N=10 plots), we sampled the most abundant species as determined through basal area weighting (enough species generally to cover ~80% of the plot’s basal area). For each species, we sampled the five (three in the lowlands) largest trees (based on diameter at breast height (DBH)) and sampled one sun and one shade branch. On each of these branches, leaf chemistry and leaf mass area (LMA) were measured with the methodology detailed in Asner et al. (2014). On five randomly selected leaves for each branch, we measured hemispherical reflectance with an ASD Fieldspec Handheld 2 with fiber optic cable, a contact probe that has its own calibrated light source, and a leaf clip (Analytical Spectral Devices High-Intensity Contact Probe and Leaf Clip, Boulder, Colorado, USA) following (Doughty et al., 2017). We measured leaf spectroscopy (400-1075 nm) on the same branches where the leaf traits were collected. Both LMA and Chlorophyll A had previously been shown with this dataset to have a correlation with leaf spectroscopy (Doughty et al., 2017). However, we had not previously tried to compare leaf spectral data with DBH directly.
Plot data –
Aboveground biomass - We used 2,102 of 19,160 total AGB field plots between +30° and -30° latitude classified as broadleaf evergreen trees by MODIS PFT using public data from Duncanson et al 2022 that was organized and publicly available through ORNL DAAC as an RDS (R data serialization) file. Distribution plots are shown in Fig S1 (AGB) and S2 (residuals).
NPP and GPP - We also used 21, 1 ha plots where NPP and sometimes GPP were measured following the GEM protocol (Malhi et al., 2021). We focused on two regions: a Peruvian elevation transect with both NPP + GPP (n= 10, RAINFOR plot codes are ALP11, ALP30, SPD02, SPD01, TRU03, TRU08, TRU07, ESP01, WAY01, ACJ01(Malhi et al., 2017)) and a Bornean logging transect with only NPP (n= 11 RAINFOR plot codes are DAN-04, DAN-05, LAM-01, LAM-02, MLA-01, MLA-02, SAF-01, SAF-02, SAF-03, SAF-04, SAF-05 (Riutta et al., 2018). These plots were chosen because there are large changes in NPP/GPP across the elevation or logging gradient.
GEDI data – We used the vertical forest structure (L2A and L2B, Version 2) and biomass (L4a) products from the GEDI instrument (R. Dubayah et al., 2020) between April 2019 to December 2022 for tropical forest regions (R. O. Dubayah et al., 2023). We used a quality filtering recipe developed in collaboration with GEDI Science Team members from the University of Maryland and NASA Goddard to identify the highest quality GEDI vegetation shots (R. Dubayah et al., 2022). A data layer that this iterative local outlier detection algorithm uses to exclude data is publicly available at R. O. Dubayah et al., (2023). For instance, some of the key data filters we applied were: included degrade flags of 0,3,8,10,13,18,20,23,28,30,33,38,40,43,48,60,63,68, L2A and L2B quality flags = 1 (only use highest quality data), sensitivity >= 0.98. With the GEDI data, we used canopy height, the height of median energy (HOME), and the number of canopy layers following Doughty et al 2023 (Doughty et al., 2023).
Across all tropical forests, we created 300 by 300 m pixels containing all averaged (mean) GEDI data between 2019 and 2022. Using the centroid coordinates from each of the 2,102 plots, we found the 300 by 300 m averaged GEDI pixel that encompassed the plot. If the plot was not encompassed by the GEDI data, we searched a wider area by incrementally averaging a gradually increasing area of 1, 3, 5, and 10 pixels. In other words, if no 300 by 300 m pixel encompassed the plot, then we averaged all GEDI data an area one pixel out (4 by 4 = 1200 by 1200 m, 6 by 6 = 1800 by 1800m, 11 by 11 = 3300m by 3300m), gradually increasing the square until it encompassed an area with GEDI data. To compare with the NPP/GPP plots we compared RS trait and GEDI data for individual footprints within a 0.03 km radius of the plot coordinates.
Remotely sensed leaf trait data – Based on a broader set of field campaigns, Aguirre-Gutiérrez et al., (2021) used Sentinel-2, climatic, topography, and soil data to create remotely sensed canopy trait maps for P=phosphorus % leaf concentration, WD = wood density g.cm-3, and LMA=Leaf mass area g m-2.
Other data layers – We compared % one peak to several other climates, soils, leaf traits, and ecoregion maps listed below for the Amazon basin. Each dataset had its own resolution, which we standardized to 0.1 by 0.1 degrees. We used total cation exchange capacity (CEC) from soil grids (Batjes et al., 2020) from 0-5cm in units of mmol(c)/kg. We averaged TerraClimate (Abatzoglou et al., 2018) data between 2000 and 2018 for Vapor Pressure Deficit (VPD in kPa), Mean Monthly Precipitation (MMP) (mm/month), potential evapotranspiration (PET) and maximum and minimum temperature (°C).
Statistical analysis – We used the Matlab (Matlab, MathWorks Inc., Natick, MA, USA) function “fitlm” to fit linear models to compare variables such as soil data, environmental data, leaf trait data (at 0.1° resolution) and GEDI structure data (300m and bigger resolution) to field biomass and NPP/GPP estimates. The P values listed are for the t-statistic of the two-sided hypothesis test. We used R to create a linear model to predict the best model ranked by AIC and parsimony using the dredge function from the MuMIn library (Bartoń, 2009). We also used the CAR package (Fox J & S, 2019) and the VIF command to test for multi-collinearity between variables. To account for spatial autocorrelation, we used Simultaneous Auto-Regressive (SARerr) models (F. Dormann et al., 2007) using the R library ‘spdep’ (Bivand, Hauke, & Kossowski, 2013). We tested different neighborhood distances from 10 km to 300 km and found that AIC was minimized at 80 km (Fig S3) and the corresponding correlogram showed reduced spatial autocorrelation (Fig S4). To predict leaf traits with the spectral information, we used the Partial Least Squares Regression (PLSR) (Geladi & Kowalski, 1986) using the PLSregress command in Matlab (Matlab, MathWorks Inc., Natick, MA, USA). To avoid over-fitting the number of latent factors, we minimized the mean square error with K-fold cross-validation. We use 70% of our data to calibrate our model and then the remaining 30% to test the accuracy of our model using r2. We use adjusted r2 which penalizes for small sample sizes throughout the manuscript.