Laboratory-based hyperspectral visible near-infrared reflectance spectral dataset of soil samples across a range of surface orientations
Data files
Mar 19, 2024 version files 2.16 GB
-
HSICalLib_20230223_b1-b30_pga_dI_melt.RData
-
HSICalLib_20230223_b1-b30_pga_melt.RData
-
HSICalLib_20230223_b1-b30_pga_refI_melt.RData
-
HSICalLib_20230223_b1-b30_pga.RData
-
HSICalLib_20230223_s0-s60_cosIL_all.RData
-
HSICalLib_20230310_rutgerssamples_rand1.RData
-
HSICalLib_20230613_globaldI_OCpredict_dIc_p.RData
-
HSICalLib_20230613_globaldI_OCpredict_obsI_p.RData
-
HSICalLib_20230613_globaldI_OCpredict_refI_p.RData
-
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_dIc.RData
-
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData
-
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData
-
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
-
HSICalLib_20230613_globaldI_PLSR_log10volC_summarystats_OC.RData
-
HSICalLib_20230613_globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData
-
HSICalLib_20230613_globaldI_spectralstats_dIc_1.RData
-
HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
-
HSICalLib_20230613_globaldI_spectralstats_dIc_w_a_2.RData
-
HSICalLib_20230613_globaldI_spectralstats_dIc_w_s_2.RData
-
HSICalLib_20230613_pga_slm_cosIL.RData
-
HSICalLib_20230613_spectralstats_ccor.RData
-
HSICalLib_20230613_spectralstats_coscor.RData
-
HSICalLib_b1_b30_p.RData
-
HSICalLib_b1-b30_20230223_p_gold.RData
-
HSICalLib_wavevec.RData
-
README.md
Abstract
A custom-designed, 3-D printed sample array was used to present 681 homogenized soil samples packed into sample wells to a laboratory-based visible near-infrared (VNIR) hyperspectral imaging (HSI) reflectance spectrometer in a total of 91 different configurations of slope and aspect. Hyperspectral imaging was performed with a high-sensitivity sCMOS VNIR hyperspectral camera (MSV 500, Middleton Spectral Vision, Middleton, WI). Raw reflectance data were collected with FastFrame data acquisition software (Middleton Spectral Vision, Middleton, WI). After data were collected using FastFrame, data processing and analysis were performed in R using the R Scripts found on GitHub at github.com/aduro005/HSITopographicCorrectionRScripts. The design for this sample array can be found on GitHub at github.com/aduro005/HSITopographicCorrectionSampleWellArray.
README: Laboratory-based hyperspectral visible near-infrared reflectance spectral dataset of soil samples across a range of surface orientations
This README file was generated on 03/02/2024 by Alyssa Duro
This README file describes the R scripts available on GitHub at https://github.com/aduro005/HSITopographicCorrectionRscripts and the .RData files (available on Dryad Data Repository UCR at https://doi.org/10.6086/D15091. These R scripts and .RData files are associated with the HSI topographic correction method described in the article titled Topographic correction of visible near-infrared reflectance spectra for horizon-scale soil organic carbon mapping available at https://doi.org/10.1002/saj2.20612.
# ----------
# Some notes on this dataset:
This project began with the intention of calibrating several empirical regression equations to predict soil chemical properties from visible near-infrared (VNIR) reflectance spectra of soil surfaces positioned at different angles relative to a VNIR hyperspectral imaging (HSI) reflectance spectrometer. This is why the prefix “HSICalLib'' is used in file names and why this project is referred to in some places as the “HSI Calibration Library”. Later, the focus of the project narrowed, and the goal became developing a method for removing the influence of surface orientation from VNIR reflectance spectra. As a result, this project is also sometimes referred to as the “HSI Topographic Correction”.
The FastFrame software used to operate the HSI camera and scan stage outputs 2 data files after each scan. Together, these two files are sometimes referred to as the “raw data” for each scan. These two files have the same name, but one has the extension .hdr and the other is .raw. The file name and output location are input into the FastFrame software before a scan is performed. The .hdr file can be opened using Notepad. The .raw file is a 3-dimensional matrix with dimensions 471 (number of wavebands) x number of pixels in the lateral spatial dimension (also called “columns” and “samples”) x number of pixels in the direction of the scan stage movement (also called “rows” and “lines”).
\
The raw (unprocessed) data for this project is not included in this dataset, but the authors are happy to share it upon request. This raw dataset consists of 3,125 hyperspectral images (2 files per image) and is about 2 TB in size.
The most raw version of the soil VNIR reflectance spectra (observed, uncorrected) obtained at each slope, aspect configuration along with selected soil properties is called HSICalLib_b1_b30_p.RData and can be found on the UCR Dryad Data Repository. The soil reflectance spectra included here are averaged across all the pixels within each sample well. This file is output from the Step 7 of HSI Data Processing R script called HSICalLib_4_intmean_to_masterintmean_to_p.R (which can also be found on GitHub). Starting with this file, you can follow Steps 8-14 of HSI Data Processing (also on GitHub) to obtain all of the input files you need for ALL of the HSI Data Analysis workflow (Steps 1-4 and Final Plots) which was used for the HSI Topographic Correction. Even so, all of the files used during the HSI Topographic Correction that are output after HSICalLib_b1_b30_p.RData in the Data Processing and HSI Data Analysis workflows are included in the UCR Dryad Data Repository dataset. So, you can open any R script AFTER Step 7 of HSI Data Processing (also called HSICalLib_4_intmean_to_masterintmean_to_p.R), and the necessary input files are included on the UCR Dryad Data Repository dataset.
# ----------
# File folder structure:
Data Analysis
> Data
> Output Files
> Plots
> R scripts
# ---------------------------------------------------------------------------
# Description of the data
# ---------------------------------------------------------------------------
Hyperspectral imaging of 1178 homogenized soil samples was performed with a high-sensitivity sCMOS VNIR hyperspectral camera (MSV 500, Middleton Spectral Vision, Middleton, WI) in the Department of Environmental Sciences at University of California, Riverside.
Each soil sample was packed into sample wells, positioned under the hyperspectral camera, and imaged at 98 orientations (i.e., 7 slope x 14 aspect angles) using a custom designed and 3-D printed sample array. The sample array was designed such that sample wells could be presented to the spectrometer at 7 slope angles (0°, 10°, 20°, 30°, 40°, 50°, 60°). The design for this sample array is available on GitHub at https://github.com/aduro005/HSITopographicCorrectionSampleWellArray.
The sample array was aligned to an arbitrarily defined 0° N aspect and rotated at 15° intervals from 0° N to 90° E, and 180° S to 225° W using a protractor affixed to the scan stage under the spectrometer. The 195° to 270° W aspect values were converted to 165° to 90° E aspects during data processing (see HSICalLib_6a_aspect_correction.R). This was done so the aspect angle values only took on values from 0° to 180° in the development of the topographic correction method.
574 soil samples were provided by the NEON Initial Characterization Soils Archive at the University of Michigan Biological Station Sample Archive Facility in Ehlers (UMBS-SAFE) (https://mfield.umich.edu/soil_archive_request) and accompanying soil properties data were obtained from the NEON Data Portal (https://data.neonscience.org/).
450 soil samples were collected from Duke Farms in Hillsborough Township, New Jersey (https://www.dukefarms.org/) and accompanying soil properties data were provided by the Department of Environmental Sciences at Rutgers University.
57 soil samples were collected from locations in the Santa Ana Mountains, California that were affected by wildfire and accompanying soil properties data was provided by the Department of Environmental Sciences at University of California, Riverside (http://www.thegraylab.org/).
97 soil samples were a laboratory standard soil from the Pedology Laboratory in the Department of Environmental Sciences at University of California, Riverside.
# ----------
# List of files in the Output Data folder:
HSICalLib_wavevec.RData
HSICalLib_b1_b30_p.RData
HSICalLib_b1-b30_20230223_p_gold.RData
HSICalLib_20230223_s0-s60_cosIL_all.RData
HSICalLib_20230223_b1-b30_pga.RData
HSICalLib_20230223_b1-b30_pga_melt.RData
HSICalLib_20230223_b1-b30_pga_refI_melt.RData
HSICalLib_20230223_b1-b30_pga_dI_melt.RData
HSICalLib_20230310_rutgerssamples_rand1.RData
HSICalLib_20230613_globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData
HSICalLib_20230613_globaldI_spectralstats_dIc_1.RData
HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
HSICalLib_20230613_globaldI_spectralstats_dIc_w_a_2.RData
HSICalLib_20230613_globaldI_spectralstats_dIc_w_s_2.RData
HSICalLib_20230613_pga_slm_cosIL.RData
HSICalLib_20230613_spectralstats_coscor.RData
HSICalLib_20230613_spectralstats_ccor.RData
HSICalLib_20230613_globaldI_OCpredict_refI_p.RData
HSICalLib_20230613_globaldI_OCpredict_obsI_p.RData
HSICalLib_20230613_globaldI_OCpredict_dIc_p.RData
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_dIc.RData
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData
HSICalLib_20230613_globaldI_PLSR_log10volC_summarystats_OC.RData
# ----------
# List of abbreviations:
obsI = observed, uncorrected reflectance intensity
refI = reference reflectance intensity, mean of all reflectance spectra observed for each soil sample across all aspects at zero slope, represents a soil sample’s expected reflectance spectrum when the effect of surface orientation on reflectance is absent
dI = delta intensity, obsI - refI
dIp = predicted dI, this value is predicted by a multiple linear regression model trained to predict dI from slope, aspect, wavelength, and their interaction terms
dIc = dI-corrected VNIR reflectance intensities, obsI - dI
coscorI = cosine corrected reflectance intensities
ccorI = C corrected reflectance intensities
# ----------
# Specific information for data file:
HSICalLib_wavevec.RData
# Name and type of R object: wavevec (vector, numeric)
# Number of observations (rows): 471
# Description of observations:
Each observation is a wavelength (λ) in units of (nm). Wavelengths corresponds to reflectance intensity observations (measurements) made at each waveband for reflectance spectra collected using the high-sensitivity sCMOS VNIR hyperspectral camera (MSV 500, Middleton Spectral Vision, Middleton, WI) in the Department of Environmental Sciences at University of California, Riverside. This spectrometer measures 471 reflectance intensities (i.e., reflectance intensities are measured at 471 wavebands) between 400 and 1000 nm wavelengths. The 471 observations in this vector (“wavevec”) are the wavelengths corresponding to each waveband.
# Missing data values (NA): None
# Number of variables (columns): 1
# Description of variables:
wavelength (nm)
# Related data files:
Any .hdr file resulting from a scan with this spectrometer contains the information contained in HSICalLib_wavevec.RData. However, the .hdr files also contain other information, so this vector was made in R by Alyssa Duro so the wavelength values corresponding to each waveband were accessible on their own in a .RData file.
# R script that outputs this file:
HSICalLib_3_intensities_to_intmean_intsd.R
# ----------
# Specific information for data file:
HSICalLib_b1_b30_p.RData
# Name and type of R object: p (data frame)
# Number of observations (rows): 115,444 reflectance spectra
# Description of observations:
Mean reflectance spectra for each soil sample at each orientation along with selected soil properties data and sample identifiers. This is the most raw form of the data included in this dataset. 1178 soil samples * 98 configurations = 115,444
# Missing data values (NA): Some soil properties data are not available for all soil samples resulting in NA’s.
# Number of variables (columns): 486
# Description of variables:
obsI*[wavelength]* (numeric): observed (uncorrected) reflectance intensities measured at 471 wavebands, together, these 471 values represent the average reflectance spectrum for a single soil sample at a single orientation
slope (integer, degrees): angle between the scan stage and the soil surface
aspect (integer, degrees): angle clockwise from N
batch (integer, 1-30): soil samples were imaged in groups of 40 at a time
well (integer, 1-40): indexed location of the soil sample in the sample well array
HSInumber (integer, 1-1141): unique soil sample identifier
HSIPackedDensity (numeric, g/cm3): mass soil sample per volume sample well
sandTotal (numeric, %): sand (only available for samples from the NEON archive)
siltTotal (numeric, %): silt (only available for samples from the NEON archive)
clayTotal (numeric, %): clay (only available for samples from the NEON archive)
OC (numeric, %): soil organic carbon (by weight)
archive (character): source of the soil sample and soil properties data
adod (numeric, unitless): air dried soil mass / oven dried soil mass
volC (numeric, %): soil organic carbon (by volume)
log10volC (numeric): log10(volC)
batchwellID (character): unique reflectance spectra identifier
# R script that outputs this file:
HSICalLib_4_intmean_to_masterintmean_to_p.R
# ----------
# Specific information for data file:
HSICalLib_b1-b30_20230223_p_gold.RData
# Name and type of R object: p_gold (data frame)
# Number of observations (rows): 107,486 reflectance spectra
# Description of observations:
Same as HSICalLib_b1_b30_p.RData (output from R script 4) except the reflectance spectra (rows) with unusually large or small obsI OR dI values have been identified and removed as imaging errors.
# Missing data values (NA): Missing values occur when soil properties data are not available for some soil samples. There is spectral data for all soil samples, but some soil properties were not measured for all soil samples.
# Number of variables (columns): 486
# Description of variables:
Same as HSICalLib_b1_b30_p.RData (output from R script 4)
# Related data files:
HSICalLib_b1_b30_p.RData
# R script that outputs this file:
HSICalLib_5b_p_dI_cleaning.R
# ----------
# Specific information for data file:
HSICalLib_20230223_s0-s60_cosIL_all.RData
# Name and type of R object: cosIL (numeric, data frame)
# Number of observations (rows): 91 orientations (7 slopes * 13 aspects)
# Description of observations:
Each row contains the constants needed to perform the cosine correction and C correction for 1 of 91 possible combinations of slope and aspect.
# Missing data values (NA): None
# Number of variables (columns): 16
# Description of variables:
slope: see HSICalLib_b1_b30_p.RData
aspect: see HSICalLib_b1_b30_p.RData
z1: zenith angle (degrees) between light bank 1 and the HSI camera, varies with slope, light bank 1 = N = 0 azimuth
z2: zenith angle (degrees) between light bank 2 (S) and the HSI camera, varies with slope, light bank 2 = S = 180 azimuth
meanz: average of z1 and z2 (this is the one used for the paper)
cosz1: cosine of z1
cosz2: cosine of z2
cosmeanz: cosine of (meanz)
meancosz: average of cos(z1) and cos(z2)
cosIL1: cos( illumination angle light bank 1 (IL1) ) = cos(z1)*cos(slope) + sin(z1)*sin(slope)*cos(azimuth-aspect)
cosIL2: cos( illumination angle light bank 2 (IL2) ) = cos(z2)*cos(slope) + sin(z2)*sin(slope)*cos(azimuth-aspect)
meancosIL: average of cosIL1 and cosIL2
r1: cos(z1) / cosIL1
r2: cos(z2) / cosIL2
rmeans: ( cos(meanz) ) / (meancosIL)
rcosmeans: (meancosz) / (meancosIL) (this is the one used for the paper)
# Related data files:
HSICalLib_20230613_pga_slm_cosIL.RData
# R script that outputs this file:
HSICalLib_6b_cosIL_calculation.R
# ----------
# Specific information for data file:
HSICalLib_20230223_b1-b30_pga.RData
# Name and type of R object: pga (data frame)
# Number of observations (rows): 99,537 reflectance spectra
# Description of observations:
Same as HSICalLib_b1_b30_p.RData (output from R script 4) except the number of spectra was reduced during the “aspect correction” (HSICalLib_6a_aspect_correction.R)
# Missing data values (NA): None
# Number of variables (columns): 486
# Description of variables:
Same as HSICalLib_b1_b30_p.RData
# Related data files:
HSICalLib_b1_b30_p.RData
# R script that outputs this file:
HSICalLib_7a_observedI
# ----------
# Specific information for data file:
HSICalLib_20230223_b1-b30_pga_melt.RData
# Name and type of R object: pga_melt (data frame)
# Number of observations (rows): 46,881,927 reflectance intensities
# Description of observations:
“Long” version of HSICalLib_20230223_b1-b30_pga.RData where wavelength is a variable. 99,537 reflectance spectra (from HSICalLib_20230223_b1-b30_pga.RData) * 471 wavebands = 46,881,927 reflectance intensities
# Missing data values (NA): None
# Number of variables (columns): 5
# Description of variables:
slope: see HSICalLib_b1_b30_p.RData
aspect: see HSICalLib_b1_b30_p.RData
HSInumber: see HSICalLib_b1_b30_p.RData
wavelength: see HSICalLib_wavevec.RData
obsI: Same values reported in HSICalLib_b1_b30_p.RData except now wavelength is a variable. These values have not been “corrected”.
# Related data files:
HSICalLib_20230223_b1-b30_pga.RData
# R script that outputs this file:
HSICalLib_7a_observedI
# ----------
# Specific information for data file:
HSICalLib_20230223_b1-b30_pga_refI_melt.RData
# Name and type of R object: pga_refI_melt (data frame)
# Number of observations (rows): 46,881,927 reflectance intensities
# Description of observations:
Same as HSICalLib_20230223_b1-b30_pga_melt.RData except reference reflectance intensities (refI) are reported instead of obsI.
# Missing data values (NA): None
# Number of variables (columns): 5
# Description of variables:
Same as
slope: see HSICalLib_b1_b30_p.RData
aspect: see HSICalLib_b1_b30_p.RData
HSInumber: see HSICalLib_b1_b30_p.RData
wavelength: see HSICalLib_wavevec.RData
refI: These values represent the average reflectance spectrum measured for each soil sample across all aspect positions at zero slope. There is only 1 reference spectrum per soil sample.
# Related data files:
HSICalLib_20230223_b1-b30_pga_melt.RData
# R script that outputs this file:
HSICalLib_7b_referenceI
# ----------
# Specific information for data file:
HSICalLib_20230223_b1-b30_pga_dI_melt.RData
# Name and type of R object: pga_dI_melt (data frame)
# Number of observations (rows): 46,881,927 reflectance intensities
# Description of observations:
Same as HSICalLib_20230223_b1-b30_pga_melt.RData except dI values are reported instead of obsI
# Missing data values (NA): None
# Number of variables (columns): 5
# Description of variables:
slope: see HSICalLib_b1_b30_p.RData
aspect: see HSICalLib_b1_b30_p.RData
HSInumber: see HSICalLib_b1_b30_p.RData
wavelength: see HSICalLib_wavevec.RData
dI: actual (measured) delta (“change in”) reflectance intensity = obsI - refI
# Related data files:
HSICalLib_20230223_b1-b30_pga_melt.RData
HSICalLib_20230223_b1-b30_pga_refI_melt.RData
# R script that outputs this file:
HSICalLib_7c_dI_calculation
# ----------
# Specific information for data file:
HSICalLib_20230310_rutgerssamples_rand1.RData
# Name and type of R object: rand1 (numeric, vector)
# Number of observations (rows): 50
# Description of observations:
A randomly chosen subset of 50 soil samples (out of the 450 samples collected from Duke Farms and imaged using HSI) were included in the topographic correction study due to these soil sample properties all being very similar while making up a large portion of the training data. This vector contains the HSI numbers for these 50 randomly chosen soil samples (all from the Rutgers archive).
# Missing data values (NA): None
# Number of variables (columns): 1
# Description of variables:
HSInumber: see HSICalLib_b1_b30_p.RData
# Related data files:
HSICalLib_20230613_globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData
# R script that outputs this file:
HSICalLib_8b_dI_predict_global_final.R
# ----------
# Specific information for data file:
HSICalLib_20230613_globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData
# Name and type of R object: pga_dI_refI_melt_slm_dIp_dIc_2 (data frame)
# Number of observations (rows): 22,678,179
# Description of observations:
Each row is a reflectance intensity for a single soil sample at a single orientation at a single wavelength. Same as HSICalLib_20230223_b1-b30_pga_melt.RData except the number of observations was reduced by selecting ONLY the HSInumbers (spectra) for the 681 soil samples included in this study. This data frame is output AFTER training and evaluating the dI+ correction wherein a multiple linear regression model was trained to predict dI using slope, aspect, wavelength, and their interactions as predictor variables. This model was evaluated to get predicted dI (dIp), then dIp was used to adjust (aka “correct”) obsI resulting in dI-corrected intensities (dIc). If the model was a perfect predictor, then dIp would equal dI AND dIc would equal refI.
# Missing data values (NA): None
# Number of variables (columns): 9
# Description of variables:
Same as HSICalLib_20230223_b1-b30_pga_melt.RData except dI, refI, dIp, and dIc columns have been added.
slope: see HSICalLib_b1_b30_p.RData
aspect: see HSICalLib_b1_b30_p.RData
HSInumber: see HSICalLib_b1_b30_p.RData
wavelength: see HSICalLib_wavevec.RData
obsI: see HSICalLib_20230223_b1-b30_pga_melt.RData
dI: see HSICalLib_20230223_b1-b30_pga_dI_melt.RData
refI: see HSICalLib_20230223_b1-b30_pga_refI_melt.RData
dIp: predicted dI resulting from evaluation of the dI+ multiple linear regression model, if the model was a perfect predictor, dIp would equal dI
dIc: dI-corrected reflectance intensities = obsI - dIp, if the dI correction was successful, dIc should equal refI
# Related data files:
HSICalLib_20230223_b1-b30_pga_melt.RData
# R script that outputs this file:
HSICalLib_8b_dI_predict_global_final.R
# ----------
# Specific information for data file:
HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
# Name and type of R object: spectralstats_dIc_2 (data frame)
# Number of observations (rows): 48,149
# Description of observations:
Number of spectra compared when reflectance intensities are grouped this way before calculating objective functions. RMSE, NSE, and KGE are calculated for every soil sample at every configuration across all wavelengths. These dIc values were obtained by evaluating the dI+ multiple linear regression model (i.e., the one that includes slope, aspect, wavelength, and all their interactions as predictor variables).
# Missing data values (NA): None
# Number of variables (columns): 15
# Description of variables:
slope: see HSICalLib_b1_b30_p.RData
aspect: see HSICalLib_b1_b30_p.RData
HSInumber: see HSICalLib_b1_b30_p.RData
RMSE_obs: Root mean squared error (obsI vs refI) tells how far obsI is from refI, RMSE=0 suggests surface orientation has no effect
RMSE_dIc: Root mean squared error (dIc vs refI) tells how far dIc is from refI, RMSE=0 suggests dI correction was successful
NSE_obs: Nash-Sutcliffe efficiency (obsI vs refI) tells if obsI is closer to mean refI or refI, NSE=1 suggests surface orientation has no effect
NSE_dIc: Nash-Sutcliffe efficiency (dIc vs refI) tells if dIc is closer to mean refI or refI, NSE=1 suggests dI correction was successful
KGE_obs: Kling-Gupta efficiency (obsI vs refI), same as NSE
KGE_obs_r: Pearson correlation coefficient (obsI vs refI), component of KGE
KGE_obs_beta: mean obsI / mean refI (obsI vs refI), component of KGE, ratio of the means of obsI and refI, beta=1 suggests surface orientation has no effect
KGE_obs_alpha: standard deviation(obsI) / standard deviation(refI), component of KGE, ratio of the standard deviations of obsI and refI, alpha=1 suggests surface orientation has no effect
KGE_dIc: Kling-Gupta efficiency (dIc vs refI), same as NSE
KGE_dIc_r: Pearson correlation coefficient (dIc vs refI), component of KGE
KGE_dIc_beta: mean dIc / mean refI (dIc vs refI), component of KGE, ratio of the means of dIc and refI, beta=1 suggests dI correction was successful
KGE_dIc_alpha: standard deviation(dIc) / standard deviation(refI), component of KGE, ratio of the standard deviations of dIc and refI, alpha=1 suggests dI correction was successful
# Related data files:
HSICalLib_20230613_globaldI_spectralstats_dIc_w_s_2.RData
HSICalLib_20230613_globaldI_spectralstats_dIc_w_a_2.RData
HSICalLib_20230613_spectralstats_coscor.RData
HSICalLib_20230613_spectralstats_ccor.RData
# R script that outputs this file:
HSICalLib_8b_dI_predict_global_final.R
# ----------
# Specific information for data file:
HSICalLib_20230613_globaldI_spectralstats_dIc_1.RData
# Name and type of R object: spectralstats_dIc_1 (numeric, data frame)
# Number of observations (rows): 48,149
# Description of observations:
Number of spectra compared when reflectance intensities are grouped this way before calculating objective functions. RMSE, NSE, and KGE are calculated for every soil sample at every configuration across all wavelengths. These dIc values were obtained by evaluating the dI multiple linear regression model (i.e., the one that includes ONLY slope, aspect, and wavelength as predictor variables).
# Missing data values (NA): None
# Number of variables (columns): 15
# Description of variables:
Same as HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
# Related data files:
HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
# R script that outputs this file:
HSICalLib_8b_dI_predict_global_final.R
# ----------
# Specific information for data file:
HSICalLib_20230613_globaldI_spectralstats_dIc_w_s_2.RData
# Name and type of R object: spectralstats_dIc_w_s_2 (data frame)
# Number of observations (rows): 2,826
# Description of observations:
Number of sample groups compared when reflectance intensities are grouped this way before calculating objective functions. RMSE, NSE, and KGE are calculated across all aspects and soil samples at each wavelength, slope combination (471 wavelengths * 6 slopes = 2,826 results for each objective function). Same as HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData BUT reflectance intensities were grouped in a different way before calculating objective functions.
# Missing data values (NA): None
# Number of variables (columns): 14
# Description of variables:
slope: see HSICalLib_b1_b30_p.RData
wavelength: see HSICalLib_wavevec.RData
RMSE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
RMSE_dIc: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
NSE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
NSE_dIc: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_obs_r: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_obs_beta: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_obs_alpha: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_dIc: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_dIc_r: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_dIc_beta: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_dIc_alpha: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
# Related data files:
HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
HSICalLib_20230613_globaldI_spectralstats_dIc_w_a_2.RData
# R script that outputs this file:
HSICalLib_8b_dI_predict_global_final.R
# ----------
# Specific information for data file:
HSICalLib_20230613_globaldI_spectralstats_dIc_w_a_2.RData
# Name and type of R object: spectralstats_dIc_w_a_2 (data frame)
# Number of observations (rows): 6,123
# Description of observations:
Number of sample groups compared when reflectance intensities are grouped this way before calculating objective functions. RMSE, NSE, and KGE are calculated across all slopes and soil samples at each wavelength, aspect combination (471 wavelengths * 13 aspects = 6,123 results for each objective function). Same as HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData BUT reflectance intensities were grouped in a different way before calculating objective functions.
# Missing data values (NA): None
# Number of variables (columns): 14
# Description of variables:
aspect: see HSICalLib_b1_b30_p.RData
wavelength: see HSICalLib_wavevec.RData
RMSE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
RMSE_dIc: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
NSE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
NSE_dIc: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_obs_r: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_obs_beta: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_obs_alpha: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_dIc: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_dIc_r: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_dIc_beta: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_dIc_alpha: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
# Related data files:
HSICalLib_20230613_globaldI_spectralstats_dIc_w_s_2.RData
# R script that outputs this file:
HSICalLib_8b_dI_predict_global_final.R
# ----------
# Specific information for data file:
HSICalLib_20230613_pga_slm_cosIL.RData
# Name and type of R object: pga_slm_cosIL (data frame)
# Number of observations (rows): obsI spectra = 48,149
# Description of observations:
Same as HSICalLib_20230223_b1-b30_pga.RData except the number of spectra was reduced by selecting spectra (using HSInumbers) from ONLY the 681 soil samples used in this study AND spectra collected at non-zero slopes.
# Missing data values (NA): None
# Number of variables (columns): 500
# Description of variables:
See HSICalLib_b1_b30_p.RData (486 columns) and HSICalLib_20230223_s0-s60_cosIL_all.RData (16 columns, slope and aspect are redundant). HSICalLib_20230223_b1-b30_pga.RData was subset by soil sample and slope, then merged with HSICalLib_20230223_s0-s60_cosIL_all.RData resulting in a “wide” data frame with all the same variables as HSICalLib_b1_b30_p.RData AND the constants needed for the cosine correction (from HSICalLib_20230223_s0-s60_cosIL_all.RData).
# Related data files:
HSICalLib_20230223_b1-b30_pga.RData
HSICalLib_20230223_s0-s60_cosIL_all.RData
# R script that outputs this file:
HSICalLib_9a_cosine_correction_final
# ----------
# Specific information for data file:
HSICalLib_20230613_spectralstats_coscor.RData
# Name and type of R object: spectralstats_coscor (numeric, data frame)
# Number of observations (rows): 48,149
# Description of observations:
Number of spectra compared when reflectance intensities are grouped this way before calculating objective functions. RMSE, NSE, and KGE are calculated for every soil sample at every configuration across all wavelengths. Same as HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData except corrected spectra result from cosine correction instead of dI correction.
# Missing data values (NA): None
# Number of variables (columns): 15
# Description of variables:
slope: see HSICalLib_b1_b30_p.RData
aspect: see HSICalLib_b1_b30_p.RData
HSInumber: see HSICalLib_b1_b30_p.RData
RMSE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
RMSE_coscor: Root mean squared error (coscorI vs refI) tells how far coscorI is from refI, RMSE=0 suggests cosine correction was successful
NSE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
NSE_coscor: Nash-Sutcliffe efficiency (coscorI vs refI) tells if coscorI is closer to mean refI or refI, NSE=1 suggests cosine correction was successful
KGE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_obs_r: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_obs_beta: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_obs_alpha: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_coscor: Kling-Gupta efficiency (coscorI vs refI), same as NSE
KGE_coscor_r: Pearson correlation coefficient (coscorI vs refI), component of KGE
KGE_coscor_beta: mean coscorI / mean refI (coscorI vs refI), component of KGE, ratio of the means of coscorI and refI, beta=1 suggests cosine correction was successful
KGE_coscor_alpha: standard deviation(coscorI) / standard deviation(refI), component of KGE, ratio of the standard deviations of coscorI and refI, alpha=1 suggests cosine correction was successful
# Related data files:
HSICalLib_20230613_spectralstats_ccor.RData
# R script that outputs this file:
HSICalLib_9a_cosine_correction_final
# ----------
# Specific information for data file:
HSICalLib_20230613_spectralstats_ccor.RData
# Name and type of R object: spectralstats_ccor.RData (numeric, data frame)
# Number of observations (rows): 48,149
# Description of observations:
Number of spectra compared when reflectance intensities are grouped this way before calculating objective functions. RMSE, NSE, and KGE are calculated for every soil sample at every configuration across all wavelengths. Same as HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData except corrected spectra result from C correction instead of dI correction.
# Missing data values (NA): None
# Number of variables (columns): 15
# Description of variables:
slope: see HSICalLib_b1_b30_p.RData
aspect: see HSICalLib_b1_b30_p.RData
HSInumber: see HSICalLib_b1_b30_p.RData
RMSE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
RMSE_ccor: Root mean squared error (ccorI vs refI) tells how far ccorI is from refI, RMSE=0 suggests C correction was successful
NSE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
NSE_ccor: Nash-Sutcliffe efficiency (ccorI vs refI) tells if ccorI is closer to mean refI or refI, NSE=1 suggests C correction was successful
KGE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_obs_r: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_obs_beta: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_obs_alpha: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
KGE_ccor: Kling-Gupta efficiency (ccorI vs refI), same as NSE
KGE_ccor_r: Pearson correlation coefficient (ccorI vs refI), component of KGE
KGE_ccor_beta: mean ccorI / mean refI (ccorI vs refI), component of KGE, ratio of the means of ccorI and refI, beta=1 suggests C correction was successful
KGE_ccor_alpha: standard deviation(ccorI) / standard deviation(refI), component of KGE, ratio of the standard deviations of ccorI and refI, alpha=1 suggests C correction was successful
# Related data files:
HSICalLib_20230613_spectralstats_coscor.RData
# R script that outputs this file:
HSICalLib_10a_C_correction_final
# ----------
# Specific information for data file:
HSICalLib_20230613_globaldI_OCpredict_obsI_p.RData
# Name and type of R object: obsI_p (“wide” data frame)
# Number of observations (rows): 48,149
# Description of observations:
Same observations as HSICalLib_20230613_pga_slm_cosIL.RData except different columns are included along with the 471 observed (uncorrected) reflectance intensities (obsI) for each soil sample at each orientation.
# Missing data values (NA): None
# Number of variables (columns): 475
# Description of variables:
obsI*[wavelength]*: see HSICalLib_b1_b30_p.RData
slope: see HSICalLib_b1_b30_p.RData
aspect: see HSICalLib_b1_b30_p.RData
HSInumber: see HSICalLib_b1_b30_p.RData
log10volC: see HSICalLib_b1_b30_p.RData
# Related data files:
HSICalLib_20230613_globaldI_OCpredict_refI_p.RData
HSICalLib_20230613_globaldI_OCpredict_dIc_p.RData
# R script that outputs this file:
HSICalLib_12a_OC_predict_pls_final.R
# ----------
# Specific information for data file:
HSICalLib_20230613_globaldI_OCpredict_refI_p.RData
# Name and type of R object: refI_p (“wide” data frame)
# Number of observations (rows): 48,149
# Description of observations:
Same observations as HSICalLib_20230613_pga_slm_cosIL.RData except different columns are included along with the 471 reference reflectance intensities (refI) for each soil sample at each orientation.
# Missing data values (NA): None
# Number of variables (columns): 475
# Description of variables:
refI*[wavelength]*: same as HSICalLib_b1_b30_p.RData except these are reference reflectance intensities rather than obsI
slope: see HSICalLib_b1_b30_p.RData
aspect: see HSICalLib_b1_b30_p.RData
HSInumber: see HSICalLib_b1_b30_p.RData
log10volC: see HSICalLib_b1_b30_p.RData
# Related data files:
HSICalLib_20230613_globaldI_OCpredict_obsI_p.RData
HSICalLib_20230613_globaldI_OCpredict_dIc_p.RData
# R script that outputs this file:
HSICalLib_12a_OC_predict_pls_final.R
# ----------
# Specific information for data file:
HSICalLib_20230613_globaldI_OCpredict_dIc_p.RData
# Name and type of R object: dIc_p (“wide” data frame)
# Number of observations (rows): 48,149
# Description of observations:
Same observations as HSICalLib_20230613_pga_slm_cosIL.RData except different columns are included along with the 471 dI corrected reflectance intensities (dIc) for each soil sample at each orientation.
# Missing data values (NA): None
# Number of variables (columns):
# Description of variables:
Number of columns/variables = 475
dIc*[wavelength]*: same HSICalLib_b1_b30_p.RData except these reflectance intensities have been dI corrected using the dI+ multiple linear regression model to predict dI
slope: see HSICalLib_b1_b30_p.RData
aspect: see HSICalLib_b1_b30_p.RData
HSInumber: see HSICalLib_b1_b30_p.RData
log10volC: see HSICalLib_b1_b30_p.RData
# Related data files:
HSICalLib_20230613_globaldI_OCpredict_refI_p.RData
HSICalLib_20230613_globaldI_OCpredict_obsI_p.RData
# R script that outputs this file:
HSICalLib_12a_OC_predict_pls_final.R
# ----------
# Specific information for data file:
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
# Name and type of R object: sstatdf_tr_ref (numeric, data frame)
# Number of observations (rows): 681
# Description of observations:
Each row contains error metrics for log10volC predictions made from the reference spectrum for each soil sample. This partial least squares regression model was trained on these same 681 reference spectra (1 for each soil sample).
# Missing data values (NA): None
# Number of variables (columns): 12
# Description of variables:
predicted: log10volC predicted by the “reference” partial least squares regression model using 471 reference reflectance intensities (refI) as predictor variables
observed: log10volC observed based on laboratory measurements of SOC
slope: see HSICalLib_b1_b30_p.RData
aspect: see HSICalLib_b1_b30_p.RData
NOTE: slope and aspect are the same for all samples (rows) in this data frame because refI is the same regardless of orientation
HSInumber: see HSICalLib_b1_b30_p.RData
RMSE: Root mean squared error (observed log10volC vs predicted log10volC) tells how far predicted is from observed, smaller RMSE means better prediction, ideal RMSE=0
NSE: Nash-Sutcliffe efficiency (observed log10volC vs predicted log10volC) tells if predicted is closer to mean observed or observed, NSE=1 means the model is a perfect predictor, NSE<0 means predicted is closer to mean observed than observed
R2: Coefficient of determination (observed log10volC vs predicted log10volC)
KGE: Kling-Gupta efficiency (observed log10volC vs predicted log10volC), same as NSE
KGE_r: Pearson correlation coefficient (observed log10volC vs predicted log10volC)
KGE_beta: mean predicted / mean observed (observed log10volC vs predicted log10volC), component of KGE, ideal beta=1
KGE_alpha: standard deviation(predicted) / standard deviation(observed), component of KGE, ideal alpha=1
# Related data files:
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData
# R script that outputs this file:
HSICalLib_12a_OC_predict_pls_final.R
# ----------
# Specific information for data file:
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData
# Name and type of R object: sstatdf_obs
# Number of observations (rows): 48,149
# Description of observations:
Rows contain error metrics for log10volC predictions made from all of the observed (uncorrected) spectra collected at non zero slopes for each soil sample (1 prediction per spectrum means multiple log10volC predictions for each soil sample). This partial least squares regression model was trained on 681 reference spectra (1 for each soil sample).
# Missing data values (NA): None
# Number of variables (columns): 12
# Description of variables:
predicted: log10volC predicted by the “reference” partial least squares regression model using 471 observed (uncorrected) reflectance intensities (obsI) as predictor variables
observed: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
slope: see HSICalLib_b1_b30_p.RData
aspect: see HSICalLib_b1_b30_p.RData
HSInumber: see HSICalLib_b1_b30_p.RData
RMSE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
NSE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
R2: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
KGE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
KGE_r: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
KGE_beta: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
KGE_alpha: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
# Related data files:
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData
# R script that outputs this file:
HSICalLib_12a_OC_predict_pls_final.R
# ----------
# Specific information for data file:
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData
# Name and type of R object: sstatdf_tr_dIc (data frame)
# Number of observations (rows): 681
# Description of observations:
Rows contain error metrics for log10volC predictions made from the training dIc spectra (1 prediction and 1 spectrum per sample from a randomly chosen orientation). This partial least squares regression model was trained on the same 681 dIc spectra.
# Missing data values (NA): None
# Number of variables (columns): 12
# Description of variables:
Same as HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except this model was trained and evaluated using the “corrected” PLSR model and dI corrected training spectra.
predicted: log10volC predicted by evaluating the “corrected” partial least squares regression model using 471 dI corrected reflectance intensities (dIc) as predictor variables. Only spectra from the “corrected” PLSR model training set were evaluated.
observed: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
slope: see HSICalLib_b1_b30_p.RData
aspect: see HSICalLib_b1_b30_p.RData
HSInumber: see HSICalLib_b1_b30_p.RData
RMSE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
NSE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
R2: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
KGE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
KGE_r: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
KGE_beta: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
KGE_alpha: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
# Related data files:
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_dIc.RData
# R script that outputs this file:
HSICalLib_12a_OC_predict_pls_final.R
# ----------
# Specific information for data file:
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_dIc.RData
# Name and type of R object: sstatdf_dIc (numeric, data frame)
# Number of observations (rows): 48,149
# Description of observations:
Rows contain error metrics for log10volC predictions made from dI corrected reflectance spectra from all orientations for all soil samples using the “corrected” partial least squares regression model that was trained on 681 dI corrected spectra (1 per soil sample from a randomly chosen orientation). Multiple predictions are made for each soil sample since more than 1 spectrum per sample is evaluated.
# Missing data values (NA): None
# Number of variables (columns): 12
# Description of variables:
Same as HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData except this model was trained and evaluated using the “corrected” PLSR model and dI corrected spectra.
predicted: log10volC predicted by evaluating the “corrected” partial least squares regression model using 471 dI corrected reflectance intensities (dIc) from all soil samples at all orientations as predictor variables. More than 1 spectrum per soil sample is evaluated so multiple predictions are made for each soil sample.
observed: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
slope: see HSICalLib_b1_b30_p.RData
aspect: see HSICalLib_b1_b30_p.RData
HSInumber: see HSICalLib_b1_b30_p.RData
RMSE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
NSE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
R2: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
KGE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
KGE_r: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
KGE_beta: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
KGE_alpha: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
# Related data files:
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData
# R script that outputs this file:
HSICalLib_12a_OC_predict_pls_final.R
# ----------
# Specific information for data file:
HSICalLib_202300613_globaldI_PLSR_log10volC_summarystats_OC.RData
# Name and type of R object: summarystats_OC (data frame, numeric)
# Number of observations (rows): 78
# Description of observations:
Error metrics for all 78 (6 slopes * 13 aspects) non-zero orientations across all predictions made by 1) evaluating the “reference” PLSR model on all observed (obsI) spectra (same results used in HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData), and 2) evaluating the “corrected” PLSR model on all dI corrected spectra (same results used in HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_dIc.RData).
# Missing data values (NA): None
# Number of variables (columns): 16
# Description of variables:
Same as HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except error metrics for this data frame were calculated across all predictions made at each orientation from either 1) all obsI vs training refI spectra using the reference PLSR model, or 2) all dIc vs training dIc spectra using the corrected PLSR model.
slope: see HSICalLib_b1_b30_p.RData
aspect: see HSICalLib_b1_b30_p.RData
RMSE_obs: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except calculated across all soil samples at each orientation, predicted = “reference” PLSR model evaluated using all obsI spectra, observed = “reference” PLSR model evaluated using training refI spectra
RMSE_dIc: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData except calculated across all soil samples at each orientation, predicted = “corrected” PLSR model evaluated using all dIc spectra, observed = “corrected” PLSR model evaluated using training dIc spectra
NSE_obs: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except calculated across all soil samples at each orientation, predicted = “reference” PLSR model evaluated using all obsI spectra, observed = “reference” PLSR model evaluated using training refI spectra
NSE_dIc: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData except calculated across all soil samples at each orientation, predicted = “corrected” PLSR model evaluated using all dIc spectra, observed = “corrected” PLSR model evaluated using training dIc spectra
R2_obs: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except calculated across all soil samples at each orientation, predicted = “reference” PLSR model evaluated using all obsI spectra, observed = “reference” PLSR model evaluated using training refI spectra
R2_dIc: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData except calculated across all soil samples at each orientation, predicted = “corrected” PLSR model evaluated using all dIc spectra, observed = “corrected” PLSR model evaluated using training dIc spectra
KGE_obs: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except calculated across all soil samples at each orientation, predicted = “reference” PLSR model evaluated using all obsI spectra, observed = “reference” PLSR model evaluated using training refI spectra
KGE_dIc: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData except calculated across all soil samples at each orientation, predicted = “corrected” PLSR model evaluated using all dIc spectra, observed = “corrected” PLSR model evaluated using training dIc spectra
KGE_r_obs: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except calculated across all soil samples at each orientation, predicted = “reference” PLSR model evaluated using all obsI spectra, observed = “reference” PLSR model evaluated using training refI spectra
KGE_r_dIc: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData except calculated across all soil samples at each orientation, predicted = “corrected” PLSR model evaluated using all dIc spectra, observed = “corrected” PLSR model evaluated using training dIc spectra
KGE_beta_obs: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except calculated across all soil samples at each orientation, predicted = “reference” PLSR model evaluated using all obsI spectra, observed = “reference” PLSR model evaluated using training refI spectra
KGE_beta_dIc: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData except calculated across all soil samples at each orientation, predicted = “corrected” PLSR model evaluated using all dIc spectra, observed = “corrected” PLSR model evaluated using training dIc spectra
KGE_alpha_obs: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except calculated across all soil samples at each orientation, predicted = “reference” PLSR model evaluated using all obsI spectra, observed = “reference” PLSR model evaluated using training refI spectra
KGE_alpha_dIc: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData except calculated across all soil samples at each orientation, predicted = “corrected” PLSR model evaluated using all dIc spectra, observed = “corrected” PLSR model evaluated using training dIc spectra
# Related data files:
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_dIc.RData
# R script that outputs this file:
HSICalLib_12a_OC_predict_pls_final.R
# ---------------------------------------------------------------------------
# Sharing/Access information
# ---------------------------------------------------------------------------
# ----------
Links to other publicly accessible locations of the data:
github.com/aduro005/HSITopographicCorrectionSampleWellArray (custom designed and 3-D printed sample array)
github.com/aduro005/HSITopographicCorrectionRscripts (R scripts used to manipulate the data found on the UCR Dryad Data Repository)
# ----------
Data was derived from the following sources:
data.neonscience.org/home (574 soil samples obtained from NEON Initial Characterization Soils Archive at the University of Michigan Biological StationSample Archive Facility in Ehlers (UMBS-SAFE) with accompanying soil properties data obtained from the NEON Data Archive)
# ---------------------------------------------------------------------------
# Description of the R scripts
# ---------------------------------------------------------------------------
The titles of the R scripts indicate the order in which they are meant to be used. The only exception to this convention is the HSICalLib_0_FinalPlots.R script which could be used at different points during the workflow, but is intended to be used last.
# ----------
List of files in the R scripts folder:
HSI Data Processing:
HSICalLib_1a_hdr_raw_to_hsi_rgbmat.R
HSICalLib_1b_rgbmat_to_soilindices.R
HSICalLib_1c_rgbmat_soilindices_to_Tsoilindices.R
HSICalLib_1d_rgbmat_Tsoilindices_to_adjustedTsoilindices.R
HSICalLib_2_soilindices_hsi_to_intensities.R
HSICalLib_3_intensities_to_intmean_intsd.R
HSICalLib_4_intmean_to_masterintmean_to_p.R
HSICalLib_5a_p_obsI_cleaning.R
HSICalLib_5b_p_dI_cleaning.R
HSICalLib_6a_aspect_correction.R
HSICalLib_6b_cosIL_calculation.R
HSICalLib_7a_observedI.R
HSICalLib_7b_referenceI.R
HSICalLib_7c_dI_calculation.R
HSI Data Analysis:
HSICalLib_8b_dI_predict_global_final.R
HSICalLib_9a_cosine_correction_final.R
HSICalLib_10a_C_correction_final.R
HSICalLib_12a_OC_predict_pls_final.R
HSICalLib_0_FinalPlots.R
# ----------
# Specific information for R script:
HSICalLib_1a_hdr_raw_to_hsi_rgbmat.R
# Description of script:
Read in raw data, dark calibration, and white calibration files from the Data folder (3 .hdr and 3 .raw files), perform white and dark correction, then output .RData and .tiff files to the Output Files folder for each scan. Output files contain the hyperspectral image of the scan (reflectance intensity for 471 wavebands between 400 - 1000 nm in 2 spatial dimensions), an RGB image of the scan (reflectance intensity for red, green, and blue wavebands for each pixel in the image), and a .tiff of the RGB image (can be opened in a photo viewer).
# Input files:
2 raw data files (.hdr and .raw) per HSI scan (not included in this dataset)
hsi_HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*.hdr
hsi_HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*.raw
2 raw data files (.hdr and .raw) for white calibration (not included in this dataset)
hsi_HSICalLib_b*[batch]_[date]*_white.hdr
hsi_HSICalLib_b*[batch]_[date]*_white.raw
2 raw data files (.hdr and .raw) for dark calibration (not included in this dataset)
hsi_HSICalLib_b*[batch]_[date]*_dark.hdr
hsi_HSICalLib_b*[batch]_[date]*_dark.raw
# Output files:
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_hsi.RData (not included in this dataset)
Array with the same dimensions as the .raw file, but the raw reflectance intensities have been scaled between 0 (dark, minimum reflectance) and 1 (white, maximum reflectance) using white and dark calibration scan data.
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_rgbmat.RData (not included in this dataset)
Array with the same spatial dimensions as the .raw file but the spectral dimension only contains data for 3 wavebands corresponding to the red, green, and blue color wavelengths. This array is used to generate the RGB.tiff file.
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_mmResolution_RGB.tiff (not included in this dataset)
RGB image of the scan
# ----------
# Specific information for R script:
HSICalLib_1b_rgbmat_to_soilindices.R
# Description of script:
Make a “template” with the row and column coordinates for every pixel occurring within the 40 sample wells at all 98 slope, aspect configurations. This script can be used to manually identify the location (row, column coordinates) of pixels occurring within sample wells from images. The result of this is a “template” which can be used to automatically identify the location (row, column coordinates) of pixels occurring within sample wells from any image given the slope and aspect of the sample well array in the image AND isolate the reflectance spectra from those pixels.
# Input files:
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_mmResolution_RGB_rgbmat.RData (not included in this dataset)
Output from R script 1a.
# Output files:
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_soilindices.RData (not included in this dataset)
List containing 40 elements (1 for each sample well/soil sample). Each element of the list contains a data frame with 2 columns “soilrows” and “soilcols”. These are the row, column coordinates (i.e., the 2 spatial dimensions of “rgbmat” or “hsi” arrays) of pixels occurring within each of the 40 sample wells (i.e., the 40 elements in this list) at all 98 orientations. There is a separate file for each configuration, and the configuration (i.e., slope and aspect) is indicated in the file name. These coordinates are later used to isolate spectra from ONLY the areas of the images that correspond to sample wells AND to match reflectance spectra from sample wells to the correct soil sample and its properties.
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_mmResolution_RGB_soilindices.tiff (not included in this dataset)
RGB image of the scan with pixels corresponding to sample wells turned some color. These images were used to visually check that the row, column coordinates (indicated by HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_soilindices.RData) were correctly aligned with actual locations of sample wells in images.
# ----------
# Specific information for R script:
HSICalLib_1c_rgbmat_soilindices_to_Tsoilindices.R
# Description of script:
Use “soilindices” list (created in R script 1b) along with “rgbmat” arrays to identify pixels occurring in sample wells and turn those pixels a certain color using the row, column indices (pixels) indicated by soilindices.RData or Tsoilindices.RData (i.e., the “soilindices” list). Then output the updated “soilindices” list as “_Tsoilindices.RData” and an image called “_Tsoilindices.tiff” where sample well pixels are turned some color (i.e., certain values are manually assigned to the red, green, and blue wavebands for pixels occurring in sample wells).\
# Input files:
HSICalLib_b*[1]_*date[1]_s*[slope]_a[aspect]*_soilindices.RData (not included in this dataset) OR
HSICalLib_b*[4]_*date[4]_s*[slope]_a[aspect]*_Tsoilindices.RData (not included in this dataset)
Output from R script 1a
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_rgbmat.RData (not included in this dataset)
Output from R script 1a
# Output files:
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_mmResolution_RGB_Tsoilindices.tiff (not included in this dataset)
Same as _soilindices.tiff (output from 1b) except the row and column indices occurring in sample wells have been automatically selected based on the “master” templates that were manually created for each orientation using batch 1 and 4 (R scripts not included in this dataset).
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_Tsoilindices.RData (not included in this dataset)
Same as _soilindices.RData (output from 1b) except the row and column indices corresponding to sample wells have been automatically selected based on the “master” templates that were manually created for each orientation using batch 1 and 4 (R scripts not included in this dataset).
# ----------
# Specific information for R script:
HSICalLib_1d_rgbmat_Tsoilindices_to_adjustedTsoilindices
# Description of script:
Use “_Tsoilindices.RData” (“soilindices” list) (created in R script 1c) along with “rgbmat” arrays to MANUALLY adjust the location of pixels occurring in sample wells and turn those pixels some color using the row, column indices (pixels) indicated by “_Tsoilindices.RData” AND visual inspection by a user in R. Then output the updated “soilindices” list as “_Tsoilindices.RData” and “_Tsoilindices.tiff”.
# Input files:
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_rgbmat.RData (not included in this dataset)
Output from R script 1a
HSICalLib_b*[4]_*date[4]_s*[slope]_a[aspect]*_Tsoilindices.RData (not included in this dataset)
Output from R script 1c
# Output files:
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_Tsoilindices.RData (not included in this dataset)
Same as the “soilindices” list (output from R script 1b and 1c) except the locations of pixels occurring within sample wells have been adjusted based on visual inspection by a user.
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_mmResolution_RGB_Tsoilindices.tiff (not included in this dataset)
Same as the RGB image output from R script 1b and 1c except the locations of pixels occurring within sample wells have been adjusted based on visual inspection by a user.
# ----------
# Specific information for R script:
HSICalLib_2_soilindices_hsi_to_intensities.R
# Description of script:
Use “_Tsoilindices.RData” and “_hsi.RData” to isolate reflectance spectra (i.e., reflectance intensities measured at 471 wavebands) from pixels (i.e., row, column coordinates indicated by “_Tsoilindices.RData”) corresponding to sample wells (soil samples) in “_hsi.RData” (output from R script 1a).
# Input files:
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_Tsoilindices.RData (not included in this dataset)
Output from R script 1d
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_hsi.RData (not included in this dataset)
Output from R script 1a
# Output files:
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_hsi_intensities.RData (not included in this dataset)
List containing 40 elements (1 for each sample well/soil sample). Each element of the list contains a data frame where each row is a reflectance spectrum from 1 pixel occurring within a sample well. For example, the first element of the list contains reflectance spectra from all the pixels occurring within the first sample well.
Number of rows = number of pixels occurring within this sample well
Number of columns = 471 reflectance intensities
NOTE: the wavelengths corresponding to these 471 reflectance intensities can be found in wavevec.RData ().
# ----------
# Specific information for R script:
HSICalLib_3_intensities_to_intmean_intsd.R
# Description of script:
Use “_intensities.RData” (output from R script 2) to get the average reflectance spectrum of each soil sample. In other words, get the mean and sd of reflectance intensities measured at each waveband across all pixels occurring within each sample well/soil sample.
# Input files:
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_hsi_intensities.RData (not included in this dataset)
Output from R script 2
# Output files:
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_intmean.RData (not included in this dataset)
Data frame where each row contains the average reflectance spectrum for a soil sample/sample well for this batch at this configuration (also see _intensities.RData file name for batch, slope, and aspect info). Each row is the average reflectance spectrum for a sample/well. Each file corresponds to a single scan (total number of scans = 30 batches x 98 configurations).
Number of rows/samples = 40
Number of columns/variables = 475
471 reflectance intensities: See Description of the data
slope: See Description of the data
aspect: See Description of the data
batch: See Description of the data
well: See Description of the data
NOTE: Batch and Well were used together as a key to merge soil sample properties data with reflectance data in R script 4. Each soil sample has 98 reflectance spectra (1 obtained at each slope, aspect configuration) but only 1 set of properties data. The chemical and physical properties of a soil sample don’t change as the sample orientation changes, but reflectance does (as shown in this study).
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_intsd.RData (not included in this dataset)
Same as _intmean.RData, but standard deviation of reflectance at each waveband is reported rather than the mean.
# ----------
# Specific information for R script:
HSICalLib_4_intmean_to_masterintmean_to_p.R
# Description of script:
Bring in soil properties data from 4 separate sources, then merge these data frames, and output a single data frame called HSICalLib_b1_b30_prep_rutgers_neon_fire.RData which contains soil properties data for all soil samples in the study.
Bring in _intmean.RData (output from R script 3) for each scan (98 orientations/scans per batch), then output a single data frame containing the mean reflectance spectra for each soil sample at 98 configurations for ONLY this batch. This file is similar to _intmean.RData except the separate _intmean.RData files for each scan are combined into a single _intmean.RData file for each batch.
Bring in _intmean.RData for each batch (output from this script) and merge into a single data frame called “_masterintmean.RData” containing reflectance spectra for all soil samples at all orientations.
Merge _masterintmean.RData (output from this script) with _prep_rutgers_neon_fire.RData (output from this script) resulting in a data frame called “p” with reflectance spectra from all soil samples at all orientations along with selected soil properties data.
# Input files:
HSICalLib_20230418_SamplePrepData_R.csv (not included in this dataset)
Soil sample properties data provided by the Pedology Lab at UC Riverside
Created in Google Sheets by Alyssa Duro
HSICalLib_20230418_Fire_R.csv (not included in this dataset)
Soil sample properties data provided by the Gray Lab at UC Riverside
Created in Google Sheets by Alyssa Duro
HSICalLib_20230418_NEON_R.csv (not included in this dataset)
Soil sample properties data provided by NEON (NRCS performed lab analysis)
Created in Google Sheets by Alyssa Duro
HSICalLib_20230418_Rutgers_R.csv (not included in this dataset)
Soil sample properties data provided by Rutgers (samples are from Duke Farms)
Created in Google Sheets by Alyssa Duro
HSICalLib_b*[batch]_[date]_s[slope]_a[aspect]*_intmean.RData (not included in this dataset)
Output from R script 3
# Output files:
HSICalLib_b1_b30_prep_rutgers_neon_fire.RData (not included in this dataset)
Merged _SamplePrepData_R.RData, _Fire_R.RData, _NEON_R.RData, and _Rutgers_R.RData resulting in a single data frame with 1180 rows (soil samples) and 58 columns/variables (measured soil properties and sample identifiers). Many soil properties data available for some soil samples were not available for all soil samples resulting in 32,727 NA’s.
Number of rows/soil samples = 1180
Number of columns/variables = 58
HSICalLib_b*[batch]_[date]*_intmean.RData (not included in this dataset)
Same as _intmean.RData (output from R script 3) except now each soil sample (i.e., each batch, well combination) is associated with 98 different reflectance spectra, each with a different combination of slope and aspect). Each file corresponds to a single batch (total number of batches = 30).
Number of rows/soil samples/reflectance spectra = 3920
40 sample wells (soil samples) * 98 configurations
Number of columns/variables = 475
Same variables as _intmean.RData (output from R script 3)
HSICalLib_b1_b30_masterintmean.RData (not included in this dataset)
Same as _intmean.RData (output from R script 3) except now each soil sample is associated with 98 reflectance spectra collected at different slope, aspect combinations.
Number of rows/soil samples/reflectance spectra = 117,600
40 soil samples * 30 batches * 98 configurations
Number of columns/variables = 475
Same as _intmean.RData (outputs from R script 3 and 4)
HSICalLib_b1_b30_p.RData
A data frame with mean reflectance spectra for each soil sample at each orientation along with selected soil properties data and sample identifiers. NA’s occur where soil properties data are not available for a soil sample. This is the most raw form of the data included in this dataset.
Number of rows/spectra = 115,444 reflectance spectra
1178 soil samples * 98 configurations
Number of columns/variables = 486
471 observed (uncorrected) reflectance intensities
slope: See Description of the data
aspect: See Description of the data
batch: Soil samples were imaged in groups of 40 at a time
well: Indexed location of the soil sample in the sample well array
HSInumber: unique soil sample identifier
HSIPackedDensity: mass of soil sample per volume of sample well
sandTotal: % sand (only available for samples from the NEON archive)
siltTotal: % silt (only available for samples from the NEON archive)
clayTotal: % clay (only available for samples from the NEON archive)
OC: % soil organic carbon (by weight)
archive: source of the soil sample and soil properties data
adod: air dried soil mass / oven dried soil mass
volC: % soil organic carbon (by volume)
log10volC: log10(volC)
batchwellID: unique reflectance spectra identifier
# ----------
# Specific information for R script:
HSICalLib_5a_p_obsI_cleaning.R
# Description of script:
Remove reflectance spectra reflectance spectra that are not truly representative of soil samples based on visual identification of imaging errors then output as _p_clean.RData (not included in this dataset) OR output this data frame as a “long” version (where wavelength is a variable) called _p_clean_melt.RData (not included in this dataset).
Then remove reflectance spectra that are not truly representative of soil samples by removing spectra containing unusually large or small observed intensities (obsI) and output this data frame as _p_clean_obsI.RData ().
# Input files:
HSICalLib_wavevec.RData
HSICalLib_b1_b30_p.RData
(output from R script 4)
# Output files:
HSICalLib_b1-b30_20230223_p_clean.RData (not included in this dataset)
Same as _p.RData (output from R script 4) except some known (visually identified) imaging mistakes (rows/spectra) have been removed. Details are provided as comments in the R script.
Number of rows/spectra = 114,764
Number of columns/variables = 486
Same as _p.RData (output from R script 4)
HSICalLib_20230223_b1-b30_p_clean_melt.RData (not included in this dataset)
“Long” version of _p_clean.RData where wavelength is a variable
Number of rows/observed reflectance intensities (obsI) = 54,053,844
114,764 spectra * 471 wavebands
Number of columns/variables = 8
slope: See Description of the data
aspect: See Description of the data
batch: Soil samples were imaged in groups of 40 at a time
well: Indexed location of the soil sample in the sample well array
HSInumber: unique soil sample identifier
wavelength: wavelength corresponding to each obsI reflectance intensity
obsI: observed reflectance intensities ()
batchwellID: unique reflectance spectra identifier
HSICalLib_20230223_b1-b30_obsIoutliers.RData (not included in this dataset)
Character vector containing the “batchwellID” (a sample identifier unique to each reflectance spectrum) for the reflectance spectra identified as imaging errors using the observed intensities (obsI) approach.
Length/number of spectra to be removed based on obsI values = 115
HSICalLib_b1-b30_20230223_p_clean_obsI.RData
Same as _p_clean.RData (output from this script) except image mistakes have been identified (see _obsIoutliers.RData) and removed based on unusually large or small observed intensities (obsI).
Number of rows/obsI spectra = 114,649
Number of columns/variables = 486
Same as _p.RData (output from R script 4)
HSICalLib_20230223_b1-b30_p_clean_obsI_melt.RData (not included in this dataset)
“Long” version of _p_clean_obsI.RData (output from this script) where wavelength is a variable
Number of rows/observed reflectance intensities (obsI) = 53,999,679
114,649 spectra * 471 wavebands
Number of columns/variables = 8
Same as _p_clean_melt.RData (output from this script)
# ----------
# Specific information for R script:
HSICalLib_5b_p_dI_cleaning.R
# Description of script:
Remove reflectance spectra reflectance spectra that are not truly representative of soil samples (due to imaging errors) if they contain unusually large or small change in intensities (dI or ΔI) values. These dI values () are the difference between the obsI and reference intensities (refI).
# Input files:
HSICalLib_20230223_b1-b30_p_clean_obsI.RData
(output from R script 5a)
HSICalLib_wavevec.RData
# Output files:
HSICalLib_20230223_b1-b30_p_clean_obsI_dI.RData (not included in this dataset)
Same as _p_clean_obsI.RData (output from R script 5a) except 471 dI values (1 per wavelength) are reported rather than 471 observed reflectance intensities (obsI).
Number of rows/dI spectra = 114,649 (same as _p_clean_obsI.RData)
Number of columns/variables = 486
Same as _p.RData (output from R script 4).
NOTE: These are 471 dI reflectance intensities NOT obsI ()
HSICalLib_20230223_b1-b30_p_clean_obsI_dI_melt.RData (not included in this dataset)
“Long” version of _p_clean_obsI_dI.RData where wavelength is a variable
Number of rows/dI reflectance intensities = 53,999,679
114,649 spectra * 471 wavebands
Number of columns/variables = 8
slope: See Description of the data
aspect: See Description of the data
batch: Soil samples were imaged in groups of 40 at a time
well: Indexed location of the soil sample in the sample well array
HSInumber: unique soil sample identifier
wavelength: wavelength corresponding to each dI reflectance intensity
dI: the difference between obsI and refI
batchwellID: unique reflectance spectra identifier
HSICalLib_20230223_b1-b30_dIoutliers.RData (not included in this dataset)
Character vector containing the “batchwellID” (a sample identifier unique to each reflectance spectrum) for the reflectance spectra identified as imaging errors using the change/difference in intensities (dI) approach.
Length/number of spectra to be removed based on dI values = 7188
HSICalLib_b1-b30_20230223_p_gold.RData
Same as _p.RData (output from R script 4) except the reflectance spectra with unusually large or small obsI OR dI values have been identified (see _dIoutliers.RData and _obsIoutliers.RData) and removed as imaging errors.
Number of rows/obsI reflectance spectra = 107,486
Number of columns/variables = 486
Same as _p.RData (output from R script 4)
# ----------
# Specific information for R script:
HSICalLib_6a_aspect_correction.R
# Description of script:
Convert aspect (column) values 195, 210, 225, 240, 255, and 270 in _p_gold.RData (output from R script 5b) to 165, 150, 135, 120, 105, and 90 then output as _p_gold_acor.RData (wide version) and _p_gold_acor_melt.RData (long version).
# Input files:
HSICalLib_wavevec.RData
HSICalLib_b1-b30_20230223_p_gold.RData
(output from R script 5b)
# Output files:
HSICalLib_20230223_b1-b30_p_gold_acor.RData
Same as _p_gold.RData (output from R script 5b) except some of the aspect values have been converted.
Number of rows/obsI reflectance spectra = 99,804
Number of columns/variables = 486
Same as _p.RData (output from R script 4)
HSICalLib_20230223_b1-b30_p_gold_acor_melt.RData
“Long” version of _p_gold_acor.RData where wavelength is a variable
Number of rows/dI reflectance intensities = 47,007,684
107,486 spectra * 471 wavebands
Number of columns/variables = 5
slope:
aspect:
HSInumber: unique soil sample identifier
wavelength: wavelength corresponding to each obsI reflectance intensity
obsI: observed reflectance intensities ()
# ----------
# Specific information for R script:
HSICalLib_6b_cosIL_calculation.R
# Description of script:
Calculate values needed for the theoretical “cosine correction” based on measurements of the HSI setup used in this study. Details are provided as comments in the R script.
# Input files:
None
# Output files:
HSICalLib_20230223_s0-s60_cosIL_all.RData
Number of rows/orientations = 91
Number of columns/variables = 16
slope: See Description of the data
aspect: See Description of the data
z1: zenith angle (degrees) between light bank 1 and the HSI camera,
NOTE: zenith varies with slope, light bank 1 = N = 0 azimuth
z2: zenith angle (degrees) between light bank 2 (S) and the HSI camera,
NOTE: zenith varies with slope, light bank 2 = S = 180 azimuth
meanz: average of z1 and z2
cosz1: cos(z1)
cosz2: cos(z2)
cosmeanz: cos(meanz)
meancosz: average of cos(z1) and cos(z2)
cosIL1: cos( illumination angle (IL) ) light bank 1
= cos(z1)*cos(slope) + sin(z1)*sin(slope)*cos(azimuth-aspect)
cosIL2: cos( illumination angle (IL) ) light bank 2
= cos(z2)*cos(slope) + sin(z2)*sin(slope)*cos(azimuth-aspect)
meancosIL: average of cosIL1 and cosIL2
r1: cos(z1) / cosIL1
r2: cos(z2) / cosIL2
rmeans: cos(meanz) / meancosIL
rcosmeans: meancosz / meancosIL
# ----------
# Specific information for R script:
HSICalLib_7a_observedI.R
# Description of script:
Remove any remaining NA’s introduced during aspect correction in R script 6a, then output a final wide and long version of the obsI () spectra for the soil samples used in this study.
# Input files:
HSICalLib_wavevec.RData
HSICalLib_20230223_b1-b30_p_gold_acor.RData
(output from R script 6a)
# Output files:
HSICalLib_20230223_b1-b30_pga.RData
Number of rows/obsI spectra = 99,537
Number of columns/variables = 486
Same as _p.RData (output from R script 4)
HSICalLib_20230223_b1-b30_pga_melt.RData
“Long” version of _pga.RData where wavelength is a variable
Number of rows/obsI reflectance intensities = 46,881,927
99,537 spectra * 471 wavebands
Number of columns/variables = 5
Same as _p_gold_acor_melt.RData (output from R script 6a)
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
wavelength: wavelength corresponding to each obsI reflectance intensity
obsI: observed reflectance intensities ()
# ----------
# Specific information for R script:
HSICalLib_7b_referenceI.R
# Description of script:
Bring in _pga.RData (output from R script 7a), then calculate and output a final wide and long version of the reference intensities (refI) () spectra for the soil samples used in this study. NOTE: refI spectra are the same for each configuration.
# Input files:
HSICalLib_wavevec.RData
HSICalLib_20230223_b1-b30_pga.RData
(output from R script 7a)
# Output files:
HSICalLib_20230223_b1-b30_pga_refI.RData
Data frame with the same dimensions as _pga.RData (output from R script 7a) except reference intensities (refI) () are reported instead of obsI.
Number of rows/refI spectra = 99,537
Number of columns/variables = 486
Same as _p.RData (output from R script 4)
NOTE: These are 471 refI reflectance intensities NOT obsI ()
HSICalLib_20230223_b1-b30_pga_refI_melt.RData
“Long” version of _pga_refI.RData where wavelength is a variable
Number of rows/refI reflectance intensities = 46,881,927
99,537 spectra * 471 wavebands
Number of columns/variables = 5
Same as _p_gold_acor_melt.RData (output from R script 6a)
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
wavelength: wavelength corresponding to each obsI reflectance intensity
refI: reference reflectance intensities ()
# ----------
# Specific information for R script:
HSICalLib_7c_dI_calculation.R
# Description of script:
Bring in _pga.RData (output from R script 7a), then calculate and output a final wide and long version of the delta (aka “change in”) intensities (dI) () spectra for the soil samples used in this study.
# Input files:
HSICalLib_wavevec.RData
HSICalLib_20230223_b1-b30_pga.RData
(output from R script 7a)
# Output files:
HSICalLib_20230223_b1-b30_pga_dI.RData
Data frame with the same dimensions as _pga.RData (output from R script 7a) except delta (aka “change in”) intensities (dI) values () are reported instead of obsI.
Number of rows/refI spectra = 99,537
Number of columns/variables = 486
Same as _p.RData (output from R script 4)
NOTE: These are 471 dI reflectance intensities NOT obsI ()
HSICalLib_20230223_b1-b30_pga_dI_melt.RData
“Long” version of _pga_dI.RData where wavelength is a variable
Number of rows/dI reflectance intensities = 46,881,927
99,537 spectra * 471 wavebands
Number of columns/variables = 5
Same as _p_gold_acor_melt.RData (output from R script 6a)
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
wavelength: wavelength corresponding to each obsI reflectance intensity
dI: change in (aka “delta”) intensities ()
# ----------
# Specific information for R script:
HSICalLib_8b_dI_predict_global_final.R
# Description of script:
Calibrate and evaluate a multiple linear regression model to predict dI using slope, aspect, and wavelength as predictor variables. Calculate error metrics (RMSE, NSE, and KGE) to quantify whether dI-corrected intensities (dIc) are closer to reference intensities (refI) than observed intensities (obsI).
# Input files:
HSICalLib_wavevec.RData
See Description of the data
HSICalLib_20230223_b1-b30_pga_melt.RData
(output from R script 7a)
HSICalLib_20230223_b1-b30_pga_refI_melt.RData
(output from R script 7b)
HSICalLib_20230223_b1-b30_pga_dI_melt.RData
(output from R script 7c)
# Output files:
HSICalLib_20230224_pga_dI_refI_melt.RData
Merged (long) form of _pga_melt.RData (output from R script 7a), _pga_refI_melt.RData (output from R script 7b), _pga_dI_melt.RData (output from R script 7c).
Number of rows/reflectance intensities = 46,881,927
99,537 spectra * 471 wavebands
Number of columns/variables = 7
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
wavelength: wavelength corresponding to each obsI reflectance intensity
obsI: observed reflectance intensities ()
refI: reference reflectance intensities ()
dI: change in (aka “delta”) intensities ()
HSICalLib_20230310_rutgerssamples_rand1.RData
A randomly chosen subset of 50 soil samples (out of the 450 samples collected from Duke Farms and imaged using HSI) were ultimately included in the topographic correction study due to these soil sample properties all being very similar while making up a large portion of the training data. This vector contains the HSI numbers for these 50 randomly chosen soil samples (all from the Rutgers archive).
Length/number of soil samples (HSInumbers) = 50
HSICalLib_20230613_pga_dI_refI_melt_slm.RData
Data frame with the same dimensions as Same as _pga_dI_refI_melt.RData (output from this R script) except it ONLY contains obsI, refI, and dI spectra collected at all non-zero slope orientations for the 681 soil samples used in this study.
Number of rows/reflectance intensities = 22,678,179
22,678,179 intensities / 471 wavebands = 48,149 spectra
Number of columns/variables = 7
Same as _pga_dI_refI_melt.RData (output from this R script)
HSICalLib_20230613_finalHSInums.RData
A vector containing the HSI numbers corresponding to the 681 soil samples used in this study.
Length/number of soil samples (HSInumbers) = 681
HSICalLib_20230613_globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData
Same as _pga_dI_refI_melt_slm.RData (output from this R script) except the columns dIp and dIc have been added. A multiple linear regression model was trained to predict dI using slope, aspect, wavelength, and their interactions as predictor variables. This model was evaluated to get predicted dI (dIp), then dIp was used to adjust (aka “correct”) obsI resulting in corrected dI (dIc). If the model was a perfect predictor, then dIp would equal dI AND dIc would equal refI.
Number of rows/reflectance intensities = 22,678,179
Number of columns/variables = 7
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
wavelength: wavelength corresponding to each obsI reflectance intensity
obsI: observed reflectance intensities ()
refI: reference reflectance intensities ()
dI: change in (aka “delta”) intensities ()
dIp: predicted dI intensities ()
dIc: dI-corrected intensities ()
NOTE: These dIc values are compared to obsI and refI to get summary stats that quantify how well the dI correction worked (i.e., how much closer dIc was to refI than obsI was to refI).
HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
Summary stats for dI corrected spectra at each configuration across all wavelengths. Calculate RMSE, NSE, and KGE for every soil sample at every configuration. In other words, compare obsI to refI AND dIc to refI using these RMSE, NSE, and KGE metrics.
Number of rows = 48,149
Number of spectra compared to their reference using this approach
Number of columns/variables = 15
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
RMSE_obs: Root mean squared error (obsI vs refI)
RMSE_dIc: Root mean squared error (dIc vs refI)
NSE_obs: Nash-Sutcliffe efficiency (obsI vs refI)
NSE_dIc: Nash-Sutcliffe efficiency (dIc vs refI)
KGE_obs: Kling-Gupta efficiency (obsI vs refI)
KGE_obs_r: Pearson correlation coefficient (obsI vs refI)
KGE_obs_beta: mean obsI / mean refI (obsI vs refI)
KGE_obs_alpha: standard deviation(obsI) / standard deviation(refI)
KGE_dIc: Kling-Gupta efficiency (dIc vs refI)
KGE_dIc_r: Pearson correlation coefficient (dIc vs refI)
KGE_dIc_beta: mean dIc / mean refI (dIc vs refI)
KGE_dIc_alpha: standard deviation(dIc) / standard deviation(refI)
HSICalLib_20230613_globaldI_spectralstats_dIc_w_s_2.RData
Summary stats for spectra at each wavelength & slope across all aspects. Same as _spectralstats_dIc_2.RData (output from this script) BUT spectra were grouped in a different way before calculating RMSE, NSE, and KGE.
Number of rows = 2,826
Number of intensities compared to their reference using this approach
Number of columns/variables = 14
slope: See Description of the data
wavelength: wavelength corresponding to each reflectance intensity
RMSE_obs: Root mean squared error (obsI vs refI)
RMSE_dIc: Root mean squared error (dIc vs refI)
NSE_obs: Nash-Sutcliffe efficiency (obsI vs refI)
NSE_dIc: Nash-Sutcliffe efficiency (dIc vs refI)
KGE_obs: Kling-Gupta efficiency (obsI vs refI)
KGE_obs_r: Pearson correlation coefficient (obsI vs refI)
KGE_obs_beta: mean obsI / mean refI (obsI vs refI)
KGE_obs_alpha: standard deviation(obsI) / standard deviation(refI)
KGE_dIc: Kling-Gupta efficiency (dIc vs refI)
KGE_dIc_r: Pearson correlation coefficient (dIc vs refI)
KGE_dIc_beta: mean dIc / mean refI (dIc vs refI)
KGE_dIc_alpha: standard deviation(dIc) / standard deviation(refI)
HSICalLib_20230613_globaldI_spectralstats_dIc_w_a_2.RData
Summary stats for spectra at each wavelength & aspect across all slopes. Same as before but aspect (rather than slope) & wavelength. Same as _spectralstats_dIc_2.RData (output from this script) BUT spectra were grouped in a different way before calculating RMSE, NSE, and KGE.
Number of rows = 6,123
Number of intensities compared to their reference using this approach
Number of columns/variables = 14
aspect: See Description of the data
wavelength: wavelength corresponding to each reflectance intensity
RMSE_obs: Root mean squared error (obsI vs refI)
RMSE_dIc: Root mean squared error (dIc vs refI)
NSE_obs: Nash-Sutcliffe efficiency (obsI vs refI)
NSE_dIc: Nash-Sutcliffe efficiency (dIc vs refI)
KGE_obs: Kling-Gupta efficiency (obsI vs refI)
KGE_obs_r: Pearson correlation coefficient (obsI vs refI)
KGE_obs_beta: mean obsI / mean refI (obsI vs refI)
KGE_obs_alpha: standard deviation(obsI) / standard deviation(refI)
KGE_dIc: Kling-Gupta efficiency (dIc vs refI)
KGE_dIc_r: Pearson correlation coefficient (dIc vs refI)
KGE_dIc_beta: mean dIc / mean refI (dIc vs refI)
KGE_dIc_alpha: standard deviation(dIc) / standard deviation(refI)
# ----------
# Specific information for R script:
HSICalLib_9a_cosine_correction_final.R
# Description of script:
Correct spectra using the theoretical “cosine correction”. Calculate error metrics (RMSE, NSE, and KGE) to quantify whether cosine corrected intensities (coscorI) are closer to reference intensities (refI) than observed intensities (obsI) or delta I corrected (dIc).
# Input files:
HSICalLib_wavevec.RData
See Description of the data
HSICalLib_20230223_b1-b30_pga.RData
(output from R script 7a)
HSICalLib_20230223_s0-s60_cosIL_all.RData
(output from R script 6b)
HSICalLib_20230310_rutgerssamples_rand1.RData
(output from R script 8b)
HSICalLib_20230223_b1-b30_pga_melt.RData
(output from R script 7a)
HSICalLib_20230223_b1-b30_pga_refI_melt.RData
(output from R script 7b)
# Output files:
HSICalLib_20230613_pga_slm_cosIL.RData
Select the rows in _pga.RData (output from R script 7a) corresponding to the 681 soil samples used in this study (see _finalHSInums.RData output from R script 8b), then merge this data frame with _cosIL_all.RData (output from R script 6b) resulting in a wide data frame with all reflectance spectra for the soil samples used in this study AND the constants needed for the cosine correction (calculated in R script 6b based on measurements of the HSI setup used in this study).
Number of rows/obsI spectra = 48,149
Number of columns = 500
slope: See Description of the data
aspect: See Description of the data
batch: Soil samples were imaged in groups of 40 at a time
well: Indexed location of the soil sample in the sample well array
HSInumber: unique soil sample identifier
HSIPackedDensity: mass of soil sample per volume of sample well
sandTotal: % sand (only available for samples from the NEON archive)
siltTotal: % silt (only available for samples from the NEON archive)
clayTotal: % clay (only available for samples from the NEON archive)
OC: % soil organic carbon (by weight)
archive: source of the soil sample and soil properties data
adod: air dried soil mass / oven dried soil mass
volC: % soil organic carbon (by volume)
log10volC: log10(volC)
471 observed (uncorrected) reflectance intensities (obsI*[wavelength]*)
batchwellID: unique reflectance spectra identifier
z1: zenith angle (degrees) between light bank 1 and the HSI camera,
NOTE: zenith varies with slope, light bank 1 = N = 0 azimuth
z2: zenith angle (degrees) between light bank 2 (S) and the HSI camera,
NOTE: zenith varies with slope, light bank 2 = S = 180 azimuth
meanz: average of z1 and z2
cosz1: cos(z1)
cosz2: cos(z2)
cosmeanz: cos(meanz)
meancosz: average of cos(z1) and cos(z2)
NOTE: This is the way we decided to combine z1 and z2.
cosIL1: cos( illumination angle (IL) ) light bank 1
= cos(z1)*cos(slope) + sin(z1)*sin(slope)*cos(azimuth-aspect)
cosIL2: cos( illumination angle (IL) ) light bank 2
= cos(z2)*cos(slope) + sin(z2)*sin(slope)*cos(azimuth-aspect)
meancosIL: average of cosIL1 and cosIL2
r1: cos(z1) / cosIL1
r2: cos(z2) / cosIL2
rmeans: cos(meanz) / meancosIL
rcosmeans: meancosz / meancosIL
NOTE: This is the ratio used in the final cosine correction.
HSICalLib_20230613_pga_slm_coscorI.RData
Same as _pga_slm_cosIL.RData (output from this script) except intensities reported are cosine corrected intensities (coscorI) rather than obsI.
Number of rows/cosine corrected (coscorI) spectra = 48,149
Number of columns/variables = 500
slope: See Description of the data
aspect: See Description of the data
batch: Soil samples were imaged in groups of 40 at a time
well: Indexed location of the soil sample in the sample well array
HSInumber: unique soil sample identifier
HSIPackedDensity: mass of soil sample per volume of sample well
sandTotal: % sand (only available for samples from the NEON archive)
siltTotal: % silt (only available for samples from the NEON archive)
clayTotal: % clay (only available for samples from the NEON archive)
OC: % soil organic carbon (by weight)
archive: source of the soil sample and soil properties data
adod: air dried soil mass / oven dried soil mass
volC: % soil organic carbon (by volume)
log10volC: log10(volC)
471 cosine corrected reflectance intensities (coscorI*[wavelength]*)
batchwellID: unique reflectance spectra identifier
z1: zenith angle (degrees) between light bank 1 and the HSI camera,
NOTE: zenith varies with slope, light bank 1 = N = 0 azimuth
z2: zenith angle (degrees) between light bank 2 (S) and the HSI camera,
NOTE: zenith varies with slope, light bank 2 = S = 180 azimuth
meanz: average of z1 and z2
cosz1: cos(z1)
cosz2: cos(z2)
cosmeanz: cos(meanz)
meancosz: average of cos(z1) and cos(z2)
NOTE: This is the way we decided to combine z1 and z2.
cosIL1: cos( illumination angle (IL) ) light bank 1
= cos(z1)*cos(slope) + sin(z1)*sin(slope)*cos(azimuth-aspect)
cosIL2: cos( illumination angle (IL) ) light bank 2
= cos(z2)*cos(slope) + sin(z2)*sin(slope)*cos(azimuth-aspect)
meancosIL: average of cosIL1 and cosIL2
r1: cos(z1) / cosIL1
r2: cos(z2) / cosIL2
rmeans: cos(meanz) / meancosIL
rcosmeans: meancosz / meancosIL
NOTE: This is the ratio used in the final cosine correction.
HSICalLib_20230613_pga_slm_coscorI_melt.RData
“Long” version of _pga_slm_coscorI.RData (output from this script) where wavelength is a variable.
Number of rows/cosine corrected intensities (coscorI) = 22,678,179
Number of columns/variables = 5
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
wavelength: wavelength corresponding to each reflectance intensity
coscorI: cosine corrected reflectance intensity
HSICalLib_20230613_pga_coscorI_refI_melt_slm.RData
Same as _pga_slm_coscorI_melt.RData (output from this script) except reference intensity (refI) and observed intensity (obsI) have been added as a columns by merging _pga_slm_coscorI_melt.RData (output from this script) with _pga_melt.RData (output from R script 7a) AND _pga_refI_melt.RData (output from R script 7b).
Number of rows/cosine corrected intensities (coscorI) = 22,678,179
Number of columns/variables = 5
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
wavelength: wavelength corresponding to each reflectance intensity
coscorI: cosine corrected reflectance intensity
obsI: observed reflectance intensities
refI: reference reflectance intensities
HSICalLib_20230613_spectralstats_coscor.RData
Summary stats for spectra at each configuration across all wavelengths.
Number of rows = 48,149
Number of spectra compared to their reference using this approach
Number of columns/variables = 15
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
RMSE_obs: Root mean squared error (obsI vs refI)
RMSE_coscor: Root mean squared error (coscor vs refI)
NSE_obs: Nash-Sutcliffe efficiency (obsI vs refI)
NSE_coscor: Nash-Sutcliffe efficiency (coscor vs refI)
KGE_obs: Kling-Gupta efficiency (obsI vs refI)
KGE_obs_r: Pearson correlation coefficient (obsI vs refI)
KGE_obs_beta: mean obsI / mean refI (obsI vs refI)
KGE_obs_alpha: standard deviation(obsI) / standard deviation(refI)
KGE_coscor: Kling-Gupta efficiency (coscor vs refI)
KGE_coscor_r: Pearson correlation coefficient (coscor vs refI)
KGE_coscor_beta: mean coscor / mean refI (coscor vs refI)
KGE_coscor_alpha: standard deviation(coscor) / standard deviation(refI)
# ----------
# Specific information for R script:
HSICalLib_10a_C_correction_final.R
# Description of script:
Correct spectra using the semi-empirical “C correction”. Calculate error metrics (RMSE, NSE, and KGE) to quantify whether C corrected intensities (ccorI) are closer to reference intensities (refI) than observed intensities (obsI).
# Input files:
HSICalLib_wavevec.RData
HSICalLib_20230223_s0-s60_cosIL_all.RData
(output from R script 6b)
HSICalLib_20230613_pga_slm_cosIL.RData
(output from R script 9a)
HSICalLib_20230310_rutgerssamples_rand1.RData
(output from R script 8b)
HSICalLib_20230223_b1-b30_pga_melt.RData
(output from R script 7a)
HSICalLib_20230223_b1-b30_pga_refI_melt.RData
(output from R script 7b)
# Output files:
HSICalLib_20230613_C-coefficient_ccdf.RData
Data frame containing the semi-empirically determined C coefficients for every waveband. These are calculated using obsI from _pga_melt.RData (output from R script 7a) and cosIL_all.RData (output from R script 6b).
Number of rows/wavelengths = 471
Number of columns/variables = 5
wavelength: wavelength corresponding to each reflectance intensity
slope: slope of the best fit line between obsI and cosIL
NOTE: Different than “slope angle” used everywhere else
intercept: intercept of the best fit line between obsI and cosIL
ccoef: C coefficient = intercept / slope
HSICalLib_20230613_pga_slm_cosIL_ccorI.RData
Data frame containing C corrected spectra along with selected soil properties, imaging orientation, theoretically calculated constants, and soil sample identifiers. Same as _pga_slm_cosIL.RData and _pga_slm_coscorI.RData (output from R script 9a) except intensities reported are C corrected intensities (ccorI) rather than obsI or coscorI.
Number of rows/C corrected (ccorI) spectra = 48,149
Number of columns/variables = 500
slope: See Description of the data
aspect: See Description of the data
batch: Soil samples were imaged in groups of 40 at a time
well: Indexed location of the soil sample in the sample well array
HSInumber: unique soil sample identifier
HSIPackedDensity: mass of soil sample per volume of sample well
sandTotal: % sand (only available for samples from the NEON archive)
siltTotal: % silt (only available for samples from the NEON archive)
clayTotal: % clay (only available for samples from the NEON archive)
OC: % soil organic carbon (by weight)
archive: source of the soil sample and soil properties data
adod: air dried soil mass / oven dried soil mass
volC: % soil organic carbon (by volume)
log10volC: log10(volC)
471 C corrected reflectance intensities (ccorI*[wavelength]*)
batchwellID: unique reflectance spectra identifier
z1: zenith angle (degrees) between light bank 1 and the HSI camera,
NOTE: zenith varies with slope, light bank 1 = N = 0 azimuth
z2: zenith angle (degrees) between light bank 2 (S) and the HSI camera,
NOTE: zenith varies with slope, light bank 2 = S = 180 azimuth
meanz: average of z1 and z2
cosz1: cos(z1)
cosz2: cos(z2)
cosmeanz: cos(meanz)
meancosz: average of cos(z1) and cos(z2)
NOTE: This is the way we decided to combine z1 and z2.
cosIL1: cos( illumination angle (IL) ) light bank 1
= cos(z1)*cos(slope) + sin(z1)*sin(slope)*cos(azimuth-aspect)
cosIL2: cos( illumination angle (IL) ) light bank 2
= cos(z2)*cos(slope) + sin(z2)*sin(slope)*cos(azimuth-aspect)
meancosIL: average of cosIL1 and cosIL2
r1: cos(z1) / cosIL1
r2: cos(z2) / cosIL2
rmeans: cos(meanz) / meancosIL
rcosmeans: meancosz / meancosIL
NOTE: This is the ratio used in the final cosine correction.
HSICalLib_20230613_pga_slm_ccorI_melt.RData
“Long” version of _pga_slm_ccorI.RData (output from this script) where wavelength is a variable.
Number of rows/C corrected intensities (ccorI) = 22,678,179
Number of columns/variables = 5
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
wavelength: wavelength corresponding to each reflectance intensity
ccorI: C corrected reflectance intensity
HSICalLib_20230613_pga_ccorI_refI_melt_slm.RData
Same as _pga_slm_ccorI_melt.RData (output from this script) except reference intensity (refI) and observed intensity (obsI) have been added as a columns by merging _pga_slm_ccorI_melt.RData (output from this script) with _pga_melt.RData (output from R script 7a) AND _pga_refI_melt.RData (output from R script 7b).
Number of rows/C corrected intensities (ccorI) = 22,678,179
Number of columns/variables = 5
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
wavelength: wavelength corresponding to each reflectance intensity
ccorI: C corrected reflectance intensity
obsI: observed reflectance intensities
refI: reference reflectance intensities
HSICalLib_20230613_spectralstats_ccor.RData
Summary stats for spectra at each configuration across all wavelengths.
Number of rows = 48,149
Number of spectra compared to their reference using this approach
Number of columns/variables = 15
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
RMSE_obs: Root mean squared error (obsI vs refI)
RMSE_ccor: Root mean squared error (ccor vs refI)
NSE_obs: Nash-Sutcliffe efficiency (obsI vs refI)
NSE_ccor: Nash-Sutcliffe efficiency (ccor vs refI)
KGE_obs: Kling-Gupta efficiency (obsI vs refI)
KGE_obs_r: Pearson correlation coefficient (obsI vs refI)
KGE_obs_beta: mean obsI / mean refI (obsI vs refI)
KGE_obs_alpha: standard deviation(obsI) / standard deviation(refI)
KGE_ccor: Kling-Gupta efficiency (ccor vs refI)
KGE_ccor_r: Pearson correlation coefficient (ccor vs refI)
KGE_ccor_beta: mean ccor / mean refI (ccor vs refI)
KGE_ccor_alpha: standard deviation(ccor) / standard deviation(refI)
# ----------
# Specific information for R script:
HSICalLib_12a_OC_predict_pls_final.R
# Description of script:
Train and evaluate a model to predict soil organic carbon (SOC) from reference spectra (refI), observed (non-zero slope, measured, uncorrected) spectra, and delta I corrected (dIc) to see whether dIC provide a better prediction of SOC than obsI.
# Input files:
HSICalLib_wavevec.RData
See Description of the data
HSICalLib_b1-b30_20230223_p_gold.RData (for soil properties)
Output from R script 5b
HSICalLib_20230613_globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData
Output from R script 8b
HSICalLib_20230310_rutgerssamples_rand1.RData
Output from R script 8b
# Output files:
HSICalLib_20230613_p_plots.RData
Selected soil properties data for the 681 soil samples used in this study
Number of rows/soil samples = 681
Number of columns/variables = 15
slope: See Description of the data
aspect: See Description of the data
batch: Soil samples were imaged in groups of 40 at a time
well: Indexed location of the soil sample in the sample well array
HSInumber: unique soil sample identifier
HSIPackedDensity: mass of soil sample per volume of sample well
sandTotal: % sand (only available for samples from the NEON archive)
siltTotal: % silt (only available for samples from the NEON archive)
clayTotal: % clay (only available for samples from the NEON archive)
OC: % soil organic carbon (by weight)
archive: source of the soil sample and soil properties data
adod: air dried soil mass / oven dried soil mass
volC: % soil organic carbon (by volume)
log10volC: log10(volC)
batchwellID: unique reflectance spectra identifier
HSICalLib_20230613_globaldI_OCpredict_dIc.RData
“Wide” version of _globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData (output from R script 8b) where dIc (delta I corrected) intensities are reported as spectra (471 variables) for each soil sample at each orientation.
Number of rows/dIc spectra = 48,149
Number of columns/variables = 474
471 wavebands = dIc*[wavelength]*
NOTE: These reflectance intensities are dIc (dI corrected)
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
HSICalLib_20230613_globaldI_OCpredict_refI.RData
“Wide” version of _globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData (output from R script 8b) where refI (reference) intensities are reported as spectra (471 variables) for each soil sample at each orientation.
Number of rows/refI spectra = 48,149
Number of columns/variables = 474
471 wavebands = refI*[wavelength]*
NOTE: These reflectance intensities are refI (reference)
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
HSICalLib_20230613_globaldI_OCpredict_obsI.RData
“Wide” version of _globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData (output from R script 8b) where obsI (observed, uncorrected) intensities are reported as spectra (471 variables) for each soil sample at each orientation.
Number of rows/obsI spectra = 48,149
Number of columns/variables = 474
471 wavebands = obsI*[wavelength]*
NOTE: These reflectance intensities are obsI (observed)
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
HSICalLib_20230613_globaldI_OCpredict_refI_p.RData
Merge _globaldI_OCpredict_refI.RData (output from this script) with the soil properties data in _p_gold.RData resulting in a data frame with refI (reference) intensities reported as spectra (471 variables) for each soil sample at each orientation.
Number of rows/refI spectra = 48,149
Number of columns/variables = 475
471 wavebands (predictor variables for PLSR) = refI*[wavelength]*
NOTE: These reflectance intensities are refI (reference)
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
log10volC: log base 10 of % SOC by volume (outcome variable for PLSR)
HSICalLib_20230613_globaldI_OCpredict_obsI_p.RData
Merge _globaldI_OCpredict_obsI.RData (output from this script) with the soil properties data in _p_gold.RData resulting in a data frame with obsI (observed, uncorrected) intensities reported as spectra (471 variables) for each soil sample at each orientation.
Number of rows/obsI spectra = 48,149
Number of columns/variables = 474
471 wavebands = obsI*[wavelength]*
NOTE: These reflectance intensities are obsI (observed)
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
log10volC: log base 10 of % SOC by volume (outcome variable for PLSR)
HSICalLib_20230613_globaldI_OCpredict_dIc_p.RData
Merge _globaldI_OCpredict_dIc.RData (output from this script) with the soil properties data in _p_gold.RData resulting in a data frame with dIc (delta I corrected) intensities reported as spectra (471 variables) for each soil sample at each orientation.
Number of rows/dIc spectra = 48,149
Number of columns/variables = 475
471 wavebands = dIc*[wavelength]*
NOTE: These reflectance intensities are dIc (dI corrected)
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
log10volC: log base 10 of % SOC by volume (outcome variable for PLSR)
HSICalLib_20230613_globaldI_plsmodel_log10volC_refI_train.RData"
Partial least squares regression model trained with _refI_p.RData (output from this script) to predict OC from 471 reflectance intensities.
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
Evaluate pls on refI training data, then calculate RMSE, NSE, and KGE for observed (laboratory measured) SOC vs SOC predicted by the PLSR model trained and evaluated on refI.
Number of rows/SOC predictions made from refI = 681
Number of columns/variables = 12
predicted: SOC predicted by this PLSR model
observed: laboratory measured SOC
slope: See Description of the data
aspect: See Description of the data
NOTE: slope and aspect are the same for all samples (rows) in this data frame because refI is the same regardless of orientation
HSInumber: unique soil sample identifier
RMSE: Root mean squared error
NSE: Nash-Sutcliffe efficiency (observed vs refI predicted SOC)
R2: Coefficient of determination (observed vs refI predicted SOC)
KGE: Kling-Gupta efficiency(observed vs refI predicted SOC)
KGE_r: Pearson correlation coefficient (observed vs refI predicted SOC)
KGE_beta: mean refI predicted / mean observed SOC
KGE_alpha: sd(refI predicted) / sd(observed) SOC
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData
Evaluate _globaldI_plsmodel_log10volC_refI_train.RData (output from this script) on _obsI_p.RData (output from this script) to predict OC from 471 obsI intensities. Then quantify the PLSR model performance using RMSE, NSE, and KGE (compare laboratory measured SOC to PLSR model predicted SOC).
Number of rows/SOC predictions made from obsI = 48,149
Number of columns/variables = 12
predicted: SOC predicted by this PLSR model
observed: laboratory measured SOC
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
RMSE: Root mean squared error
NSE: Nash-Sutcliffe efficiency (observed vs refI predicted SOC)
R2: Coefficient of determination (observed vs refI predicted SOC)
KGE: Kling-Gupta efficiency(observed vs refI predicted SOC)
KGE_r: Pearson correlation coefficient (observed vs refI predicted SOC)
KGE_beta: mean refI predicted / mean observed SOC
KGE_alpha: sd(refI predicted) / sd(observed) SOC
HSICalLib_20230613_globaldI_dIc_train.RData
This data frame contains the reflectance spectra which were randomly selected for each soil sample to train the NEXT PLSR model along with sample identifiers and log10volC (outcome variable).
Number of rows/dIc spectra = 681
Number of columns/variables = 475
471 wavebands = dIc*[wavelength]*
NOTE: These reflectance intensities are dIc (dI corrected)
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
log10volC: log base 10 of % SOC by volume (outcome variable for PLSR)
HSICalLib_20230613_globaldI_plsmodel_log10volC_dIc_train.RData
PLSR model trained to predict SOC using 1 delta I corrected reflectance spectra per soil sample from a randomly chosen orientation (see _globaldI_dIc_train.RData, output from this script).
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData
Evaluate PLSR trained using dIc spectra from 1 randomly chosen orientation per sample on its training data, then quantify performance using RMSE, NSE, and KGE.
Number of rows/SOC predictions made from dIc = 681
Number of columns/variables = 12
predicted: SOC predicted by this PLSR model
observed: laboratory measured SOC
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
RMSE: Root mean squared error (observed vs dIc predicted SOC)
NSE: Nash-Sutcliffe efficiency (observed vs dIc predicted SOC)
R2: Coefficient of determination (observed vs dIc predicted SOC)
KGE: Kling-Gupta efficiency(observed vs dIc predicted SOC)
KGE_r: Pearson correlation coefficient (observed vs dIc predicted SOC)
KGE_beta: mean dIc predicted / mean observed SOC
KGE_alpha: sd(dIc predicted) / sd(observed) SOC
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_dIc.RData
Evaluate _globaldI_plsmodel_log10volC_dIc_train.RData (output from this script) on _dIc_p.RData (output from this script) to predict OC from 471 dIc intensities (using dIc spectra from all soil samples at all orientations). Then quantify the PLSR model performance using RMSE, NSE, and KGE (compare laboratory measured SOC to PLSR model predicted SOC).
Number of rows/SOC predictions made from dIc = 48,149
Number of columns/variables = 12
predicted: SOC predicted by this PLSR model
observed: laboratory measured SOC
slope: See Description of the data
aspect: See Description of the data
HSInumber: unique soil sample identifier
RMSE: Root mean squared error (observed vs dIc predicted SOC)
NSE: Nash-Sutcliffe efficiency (observed vs dIc predicted SOC)
R2: Coefficient of determination (observed vs dIc predicted SOC)
KGE: Kling-Gupta efficiency(observed vs dIc predicted SOC)
KGE_r: Pearson correlation coefficient (observed vs dIc predicted SOC)
KGE_beta: mean dIc predicted / mean observed SOC
KGE_alpha: sd(dIc predicted) / sd(observed) SOC
HSICalLib_202300613_globaldI_PLSR_log10volC_summarystats_OC.RData
Summary stats for spectra at each configuration across all wavelengths
# ----------
# Specific information for R script:
HSICalLib_0_FinalPlots.R
# Description of script:
Create all the final plots for the paper.
# Input files:
HSICalLib_wavevec.RData
See Description of the data
HSICalLib_20230613_globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData
Output from R script 8b
HSICalLib_20230613_globaldI_spectralstats_dIc_1.RData
Output from R script 8b
HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData
Output from R script 8b
HSICalLib_20230613_spectralstats_coscor.RData
Output from R script 9a
HSICalLib_20230613_spectralstats_ccor.RData
Output from R script 10a
HSICalLib_20230613_globaldI_spectralstats_dIc_w_s_2.RData
Output from R script 8b
HSICalLib_20230613_globaldI_spectralstats_dIc_w_a_2.RData
Output from R script 8b
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData
Output from R script 12a
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData
Output from R script 12a
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData
Output from R script 12a
HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_dIc.RData
Output from R script 12a
HSICalLib_20230613_globaldI_PLSR_log10volC_summarystats_OC.RData
Output from R script 12a
# Output files:
HSICalLib_20231014_spectralstats_boxplots_final.pdf
Figure 6
RMSE and NSE vs slope and aspect (box plots)
NOTE: Different than Figure 10 because these are for dIc predictions
HSICalLib_20231014_KGE_slope_boxplots_final.pdf
KGE, alpha, and beta vs slope (box plots)
HSICalLib_20231014_KGE_aspect_boxplots_final.pdf
KGE, alpha, and beta vs aspect (box plots)
HSICalLib_20231014_obsI_dIc_refI_spectra_final.pdf
Figure 8
RI vs wavelength (refI, obsI, and dIc), dIc colored by slope and aspect
HSICalLib_20231014_obsI_spectra_final.pdf
Figure 5
RI vs wavelength (refI and obsI), obsI colored by slope and aspect
HSICalLib_20231014_dIc_RMSE_spectra_final.pdf
Figure 7
RMSE vs wavelength (obsI and dIc), dIc colored by slope and aspect
HSICalLib_20230622_OCvalidationplots_final.pdf"
Figure 9
Observed vs predicted SOC (colored by slope)
HSICalLib_20231014_globaldI_summarystats_OC_boxplots_final.pdf
Figure 10
RMSE and NSE vs slope and aspect (box plots)
NOTE: Different than Figure 6 because these are for SOC predictions
Methods
Hyperspectral imaging (HSI) was performed with a high-sensitivity sCMOS VNIR hyperspectral camera (MSV 500, Middleton Spectral Vision, Middleton, WI). A custom-designed, 3-D printed sample array was used to present homogenized soil samples packed into sample wells to a laboratory-based HSI reflectance spectrometer at 91 configurations of slope and aspect. Pixels representing each soil sample's reflectance spectra were isolated from hyperspectral images and averaged to obtain a single reflectance spectrum for each slope and aspect configuration per soil sample.
Usage notes
Raw data were collected with FastFrame data acquisition software (Middleton Spectral Vision, Middleton, WI). Data processing and analysis were performed in R using the R Scripts found on GitHub at github.com/aduro005/HSITopographicCorrectionRScripts.