# Laboratory-based hyperspectral visible near-infrared reflectance spectral dataset of soil samples across a range of surface orientations

## Cite this dataset

Duro, Alyssa et al. (2024). Laboratory-based hyperspectral visible near-infrared reflectance spectral dataset of soil samples across a range of surface orientations [Dataset]. Dryad. https://doi.org/10.6086/D15091

## Abstract

A custom-designed, 3-D printed sample array was used to present 681 homogenized soil samples packed into sample wells to a laboratory-based visible near-infrared (VNIR) hyperspectral imaging (HSI) reflectance spectrometer in a total of 91 different configurations of slope and aspect. Hyperspectral imaging was performed with a high-sensitivity sCMOS VNIR hyperspectral camera (MSV 500, Middleton Spectral Vision, Middleton, WI). Raw reflectance data were collected with FastFrame data acquisition software (Middleton Spectral Vision, Middleton, WI). After data were collected using FastFrame, data processing and analysis were performed in R using the R Scripts found on GitHub at github.com/aduro005/HSITopographicCorrectionRScripts. The design for this sample array can be found on GitHub at github.com/aduro005/HSITopographicCorrectionSampleWellArray.

## README: Laboratory-based hyperspectral visible near-infrared reflectance spectral dataset of soil samples across a range of surface orientations

This README file was generated on 03/02/2024 by Alyssa Duro

This README file describes the R scripts available on GitHub at https://github.com/aduro005/HSITopographicCorrectionRscripts and the .RData files (available on Dryad Data Repository UCR at https://doi.org/10.6086/D15091. These R scripts and .RData files are associated with the HSI topographic correction method described in the article titled *Topographic correction of visible near-infrared reflectance spectra for horizon-scale soil organic carbon mapping* available at https://doi.org/10.1002/saj2.20612.

# ----------

# Some notes on this dataset:

This project began with the intention of calibrating several empirical regression equations to predict soil chemical properties from visible near-infrared (VNIR) reflectance spectra of soil surfaces positioned at different angles relative to a VNIR hyperspectral imaging (HSI) reflectance spectrometer. This is why the prefix “HSICalLib'' is used in file names and why this project is referred to in some places as the “HSI Calibration Library”. Later, the focus of the project narrowed, and the goal became developing a method for removing the influence of surface orientation from VNIR reflectance spectra. As a result, this project is also sometimes referred to as the “HSI Topographic Correction”.

The FastFrame software used to operate the HSI camera and scan stage outputs 2 data files after each scan. Together, these two files are sometimes referred to as the “raw data” for each scan. These two files have the same name, but one has the extension .hdr and the other is .raw. The file name and output location are input into the FastFrame software before a scan is performed. The .hdr file can be opened using Notepad. The .raw file is a 3-dimensional matrix with dimensions 471 (number of wavebands) x number of pixels in the lateral spatial dimension (also called “columns” and “samples”) x number of pixels in the direction of the scan stage movement (also called “rows” and “lines”).

\

The raw (unprocessed) data for this project is not included in this dataset, but the authors are happy to share it upon request. This raw dataset consists of 3,125 hyperspectral images (2 files per image) and is about 2 TB in size.

The most raw version of the soil VNIR reflectance spectra (observed, uncorrected) obtained at each slope, aspect configuration along with selected soil properties is called HSICalLib_b1_b30_p.RData and can be found on the UCR Dryad Data Repository. The soil reflectance spectra included here are averaged across all the pixels within each sample well. This file is output from the Step 7 of HSI Data Processing R script called HSICalLib_4_intmean_to_masterintmean_to_p.R (which can also be found on GitHub). Starting with this file, you can follow Steps 8-14 of HSI Data Processing (also on GitHub) to obtain all of the input files you need for ALL of the HSI Data Analysis workflow (Steps 1-4 and Final Plots) which was used for the HSI Topographic Correction. Even so, all of the files used during the HSI Topographic Correction that are output after HSICalLib_b1_b30_p.RData in the Data Processing and HSI Data Analysis workflows are included in the UCR Dryad Data Repository dataset. So, you can open any R script AFTER Step 7 of HSI Data Processing (also called HSICalLib_4_intmean_to_masterintmean_to_p.R), and the necessary input files are included on the UCR Dryad Data Repository dataset.

# ----------

# File folder structure:

Data Analysis

> Data

> Output Files

> Plots

> R scripts

# ---------------------------------------------------------------------------

# Description of the data

# ---------------------------------------------------------------------------

Hyperspectral imaging of 1178 homogenized soil samples was performed with a high-sensitivity sCMOS VNIR hyperspectral camera (MSV 500, Middleton Spectral Vision, Middleton, WI) in the Department of Environmental Sciences at University of California, Riverside.

Each soil sample was packed into sample wells, positioned under the hyperspectral camera, and imaged at 98 orientations (i.e., 7 slope x 14 aspect angles) using a custom designed and 3-D printed sample array. The sample array was designed such that sample wells could be presented to the spectrometer at 7 slope angles (0°, 10°, 20°, 30°, 40°, 50°, 60°). The design for this sample array is available on GitHub at https://github.com/aduro005/HSITopographicCorrectionSampleWellArray.

The sample array was aligned to an arbitrarily defined 0° N aspect and rotated at 15° intervals from 0° N to 90° E, and 180° S to 225° W using a protractor affixed to the scan stage under the spectrometer. The 195° to 270° W aspect values were converted to 165° to 90° E aspects during data processing (see HSICalLib_6a_aspect_correction.R). This was done so the aspect angle values only took on values from 0° to 180° in the development of the topographic correction method.

574 soil samples were provided by the NEON Initial Characterization Soils Archive at the University of Michigan Biological Station Sample Archive Facility in Ehlers (UMBS-SAFE) (https://mfield.umich.edu/soil_archive_request) and accompanying soil properties data were obtained from the NEON Data Portal (https://data.neonscience.org/).

450 soil samples were collected from Duke Farms in Hillsborough Township, New Jersey (https://www.dukefarms.org/) and accompanying soil properties data were provided by the Department of Environmental Sciences at Rutgers University.

57 soil samples were collected from locations in the Santa Ana Mountains, California that were affected by wildfire and accompanying soil properties data was provided by the Department of Environmental Sciences at University of California, Riverside (http://www.thegraylab.org/).

97 soil samples were a laboratory standard soil from the Pedology Laboratory in the Department of Environmental Sciences at University of California, Riverside.

# ----------

# List of files in the Output Data folder:

HSICalLib_wavevec.RData

HSICalLib_b1_b30_p.RData

HSICalLib_b1-b30_20230223_p_gold.RData

HSICalLib_20230223_s0-s60_cosIL_all.RData

HSICalLib_20230223_b1-b30_pga.RData

HSICalLib_20230223_b1-b30_pga_melt.RData

HSICalLib_20230223_b1-b30_pga_refI_melt.RData

HSICalLib_20230223_b1-b30_pga_dI_melt.RData

HSICalLib_20230310_rutgerssamples_rand1.RData

HSICalLib_20230613_globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData

HSICalLib_20230613_globaldI_spectralstats_dIc_1.RData

HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

HSICalLib_20230613_globaldI_spectralstats_dIc_w_a_2.RData

HSICalLib_20230613_globaldI_spectralstats_dIc_w_s_2.RData

HSICalLib_20230613_pga_slm_cosIL.RData

HSICalLib_20230613_spectralstats_coscor.RData

HSICalLib_20230613_spectralstats_ccor.RData

HSICalLib_20230613_globaldI_OCpredict_refI_p.RData

HSICalLib_20230613_globaldI_OCpredict_obsI_p.RData

HSICalLib_20230613_globaldI_OCpredict_dIc_p.RData

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_dIc.RData

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData

HSICalLib_20230613_globaldI_PLSR_log10volC_summarystats_OC.RData

# ----------

# List of abbreviations:

obsI = observed, uncorrected reflectance intensity

refI = reference reflectance intensity, mean of all reflectance spectra observed for each soil sample across all aspects at zero slope, represents a soil sample’s expected reflectance spectrum when the effect of surface orientation on reflectance is absent

dI = delta intensity, obsI - refI

dIp = predicted dI, this value is predicted by a multiple linear regression model trained to predict dI from slope, aspect, wavelength, and their interaction terms

dIc = dI-corrected VNIR reflectance intensities, obsI - dI

coscorI = cosine corrected reflectance intensities

ccorI = C corrected reflectance intensities

# ----------

# Specific information for data file:

HSICalLib_wavevec.RData

# Name and type of R object: wavevec (vector, numeric)

# Number of observations (rows): 471

# Description of observations:

Each observation is a wavelength (λ) in units of (nm). Wavelengths corresponds to reflectance intensity observations (measurements) made at each waveband for reflectance spectra collected using the high-sensitivity sCMOS VNIR hyperspectral camera (MSV 500, Middleton Spectral Vision, Middleton, WI) in the Department of Environmental Sciences at University of California, Riverside. This spectrometer measures 471 reflectance intensities (i.e., reflectance intensities are measured at 471 wavebands) between 400 and 1000 nm wavelengths. The 471 observations in this vector (“wavevec”) are the wavelengths corresponding to each waveband.

# Missing data values (NA): None

# Number of variables (columns): 1

# Description of variables:

wavelength (nm)

# Related data files:

Any .hdr file resulting from a scan with this spectrometer contains the information contained in HSICalLib_wavevec.RData. However, the .hdr files also contain other information, so this vector was made in R by Alyssa Duro so the wavelength values corresponding to each waveband were accessible on their own in a .RData file.

# R script that outputs this file:

HSICalLib_3_intensities_to_intmean_intsd.R

# ----------

# Specific information for data file:

HSICalLib_b1_b30_p.RData

# Name and type of R object: p (data frame)

# Number of observations (rows): 115,444 reflectance spectra

# Description of observations:

Mean reflectance spectra for each soil sample at each orientation along with selected soil properties data and sample identifiers. This is the most raw form of the data included in this dataset. 1178 soil samples * 98 configurations = 115,444

# Missing data values (NA): Some soil properties data are not available for all soil samples resulting in NA’s.

# Number of variables (columns): 486

# Description of variables:

obsI*[wavelength]* (numeric): observed (uncorrected) reflectance intensities measured at 471 wavebands, together, these 471 values represent the average reflectance spectrum for a single soil sample at a single orientation

slope (integer, degrees): angle between the scan stage and the soil surface

aspect (integer, degrees): angle clockwise from N

batch (integer, 1-30): soil samples were imaged in groups of 40 at a time

well (integer, 1-40): indexed location of the soil sample in the sample well array

HSInumber (integer, 1-1141): unique soil sample identifier

HSIPackedDensity (numeric, g/cm3): mass soil sample per volume sample well

sandTotal (numeric, %): sand (only available for samples from the NEON archive)

siltTotal (numeric, %): silt (only available for samples from the NEON archive)

clayTotal (numeric, %): clay (only available for samples from the NEON archive)

OC (numeric, %): soil organic carbon (by weight)

archive (character): source of the soil sample and soil properties data

adod (numeric, unitless): air dried soil mass / oven dried soil mass

volC (numeric, %): soil organic carbon (by volume)

log10volC (numeric): log10(volC)

batchwellID (character): unique reflectance spectra identifier

# R script that outputs this file:

HSICalLib_4_intmean_to_masterintmean_to_p.R

# ----------

# Specific information for data file:

HSICalLib_b1-b30_20230223_p_gold.RData

# Name and type of R object: p_gold (data frame)

# Number of observations (rows): 107,486 reflectance spectra

# Description of observations:

Same as HSICalLib_b1_b30_p.RData (output from R script 4) except the reflectance spectra (rows) with unusually large or small obsI OR dI values have been identified and removed as imaging errors.

# Missing data values (NA): Missing values occur when soil properties data are not available for some soil samples. There is spectral data for all soil samples, but some soil properties were not measured for all soil samples.

# Number of variables (columns): 486

# Description of variables:

Same as HSICalLib_b1_b30_p.RData (output from R script 4)

# Related data files:

HSICalLib_b1_b30_p.RData

# R script that outputs this file:

HSICalLib_5b_p_dI_cleaning.R

# ----------

# Specific information for data file:

HSICalLib_20230223_s0-s60_cosIL_all.RData

# Name and type of R object: cosIL (numeric, data frame)

# Number of observations (rows): 91 orientations (7 slopes * 13 aspects)

# Description of observations:

Each row contains the constants needed to perform the cosine correction and C correction for 1 of 91 possible combinations of slope and aspect.

# Missing data values (NA): None

# Number of variables (columns): 16

# Description of variables:

slope: see HSICalLib_b1_b30_p.RData

aspect: see HSICalLib_b1_b30_p.RData

z1: zenith angle (degrees) between light bank 1 and the HSI camera, varies with slope, light bank 1 = N = 0 azimuth

z2: zenith angle (degrees) between light bank 2 (S) and the HSI camera, varies with slope, light bank 2 = S = 180 azimuth

meanz: average of z1 and z2 (this is the one used for the paper)

cosz1: cosine of z1

cosz2: cosine of z2

cosmeanz: cosine of (meanz)

meancosz: average of cos(z1) and cos(z2)

cosIL1: cos( illumination angle light bank 1 (IL1) ) = cos(z1)*cos(slope) + sin(z1)*sin(slope)*cos(azimuth-aspect)

cosIL2: cos( illumination angle light bank 2 (IL2) ) = cos(z2)*cos(slope) + sin(z2)*sin(slope)*cos(azimuth-aspect)

meancosIL: average of cosIL1 and cosIL2

r1: cos(z1) / cosIL1

r2: cos(z2) / cosIL2

rmeans: ( cos(meanz) ) / (meancosIL)

rcosmeans: (meancosz) / (meancosIL) (this is the one used for the paper)

# Related data files:

HSICalLib_20230613_pga_slm_cosIL.RData

# R script that outputs this file:

HSICalLib_6b_cosIL_calculation.R

# ----------

# Specific information for data file:

HSICalLib_20230223_b1-b30_pga.RData

# Name and type of R object: pga (data frame)

# Number of observations (rows): 99,537 reflectance spectra

# Description of observations:

Same as HSICalLib_b1_b30_p.RData (output from R script 4) except the number of spectra was reduced during the “aspect correction” (HSICalLib_6a_aspect_correction.R)

# Missing data values (NA): None

# Number of variables (columns): 486

# Description of variables:

Same as HSICalLib_b1_b30_p.RData

# Related data files:

HSICalLib_b1_b30_p.RData

# R script that outputs this file:

HSICalLib_7a_observedI

# ----------

# Specific information for data file:

HSICalLib_20230223_b1-b30_pga_melt.RData

# Name and type of R object: pga_melt (data frame)

# Number of observations (rows): 46,881,927 reflectance intensities

# Description of observations:

“Long” version of HSICalLib_20230223_b1-b30_pga.RData where wavelength is a variable. 99,537 reflectance spectra (from HSICalLib_20230223_b1-b30_pga.RData) * 471 wavebands = 46,881,927 reflectance intensities

# Missing data values (NA): None

# Number of variables (columns): 5

# Description of variables:

slope: see HSICalLib_b1_b30_p.RData

aspect: see HSICalLib_b1_b30_p.RData

HSInumber: see HSICalLib_b1_b30_p.RData

wavelength: see HSICalLib_wavevec.RData

obsI: Same values reported in HSICalLib_b1_b30_p.RData except now wavelength is a variable. These values have not been “corrected”.

# Related data files:

HSICalLib_20230223_b1-b30_pga.RData

# R script that outputs this file:

HSICalLib_7a_observedI

# ----------

# Specific information for data file:

HSICalLib_20230223_b1-b30_pga_refI_melt.RData

# Name and type of R object: pga_refI_melt (data frame)

# Number of observations (rows): 46,881,927 reflectance intensities

# Description of observations:

Same as HSICalLib_20230223_b1-b30_pga_melt.RData except reference reflectance intensities (refI) are reported instead of obsI.

# Missing data values (NA): None

# Number of variables (columns): 5

# Description of variables:

Same as

slope: see HSICalLib_b1_b30_p.RData

aspect: see HSICalLib_b1_b30_p.RData

HSInumber: see HSICalLib_b1_b30_p.RData

wavelength: see HSICalLib_wavevec.RData

refI: These values represent the average reflectance spectrum measured for each soil sample across all aspect positions at zero slope. There is only 1 reference spectrum per soil sample.

# Related data files:

HSICalLib_20230223_b1-b30_pga_melt.RData

# R script that outputs this file:

HSICalLib_7b_referenceI

# ----------

# Specific information for data file:

HSICalLib_20230223_b1-b30_pga_dI_melt.RData

# Name and type of R object: pga_dI_melt (data frame)

# Number of observations (rows): 46,881,927 reflectance intensities

# Description of observations:

Same as HSICalLib_20230223_b1-b30_pga_melt.RData except dI values are reported instead of obsI

# Missing data values (NA): None

# Number of variables (columns): 5

# Description of variables:

slope: see HSICalLib_b1_b30_p.RData

aspect: see HSICalLib_b1_b30_p.RData

HSInumber: see HSICalLib_b1_b30_p.RData

wavelength: see HSICalLib_wavevec.RData

dI: actual (measured) delta (“change in”) reflectance intensity = obsI - refI

# Related data files:

HSICalLib_20230223_b1-b30_pga_melt.RData

HSICalLib_20230223_b1-b30_pga_refI_melt.RData

# R script that outputs this file:

HSICalLib_7c_dI_calculation

# ----------

# Specific information for data file:

HSICalLib_20230310_rutgerssamples_rand1.RData

# Name and type of R object: rand1 (numeric, vector)

# Number of observations (rows): 50

# Description of observations:

A randomly chosen subset of 50 soil samples (out of the 450 samples collected from Duke Farms and imaged using HSI) were included in the topographic correction study due to these soil sample properties all being very similar while making up a large portion of the training data. This vector contains the HSI numbers for these 50 randomly chosen soil samples (all from the Rutgers archive).

# Missing data values (NA): None

# Number of variables (columns): 1

# Description of variables:

HSInumber: see HSICalLib_b1_b30_p.RData

# Related data files:

HSICalLib_20230613_globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData

# R script that outputs this file:

HSICalLib_8b_dI_predict_global_final.R

# ----------

# Specific information for data file:

HSICalLib_20230613_globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData

# Name and type of R object: pga_dI_refI_melt_slm_dIp_dIc_2 (data frame)

# Number of observations (rows): 22,678,179

# Description of observations:

Each row is a reflectance intensity for a single soil sample at a single orientation at a single wavelength. Same as HSICalLib_20230223_b1-b30_pga_melt.RData except the number of observations was reduced by selecting ONLY the HSInumbers (spectra) for the 681 soil samples included in this study. This data frame is output AFTER training and evaluating the dI+ correction wherein a multiple linear regression model was trained to predict dI using slope, aspect, wavelength, and their interactions as predictor variables. This model was evaluated to get predicted dI (dIp), then dIp was used to adjust (aka “correct”) obsI resulting in dI-corrected intensities (dIc). If the model was a perfect predictor, then dIp would equal dI AND dIc would equal refI.

# Missing data values (NA): None

# Number of variables (columns): 9

# Description of variables:

Same as HSICalLib_20230223_b1-b30_pga_melt.RData except dI, refI, dIp, and dIc columns have been added.

slope: see HSICalLib_b1_b30_p.RData

aspect: see HSICalLib_b1_b30_p.RData

HSInumber: see HSICalLib_b1_b30_p.RData

wavelength: see HSICalLib_wavevec.RData

obsI: see HSICalLib_20230223_b1-b30_pga_melt.RData

dI: see HSICalLib_20230223_b1-b30_pga_dI_melt.RData

refI: see HSICalLib_20230223_b1-b30_pga_refI_melt.RData

dIp: predicted dI resulting from evaluation of the dI+ multiple linear regression model, if the model was a perfect predictor, dIp would equal dI

dIc: dI-corrected reflectance intensities = obsI - dIp, if the dI correction was successful, dIc should equal refI

# Related data files:

HSICalLib_20230223_b1-b30_pga_melt.RData

# R script that outputs this file:

HSICalLib_8b_dI_predict_global_final.R

# ----------

# Specific information for data file:

HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

# Name and type of R object: spectralstats_dIc_2 (data frame)

# Number of observations (rows): 48,149

# Description of observations:

Number of spectra compared when reflectance intensities are grouped this way before calculating objective functions. RMSE, NSE, and KGE are calculated for every soil sample at every configuration across all wavelengths. These dIc values were obtained by evaluating the dI+ multiple linear regression model (i.e., the one that includes slope, aspect, wavelength, and all their interactions as predictor variables).

# Missing data values (NA): None

# Number of variables (columns): 15

# Description of variables:

slope: see HSICalLib_b1_b30_p.RData

aspect: see HSICalLib_b1_b30_p.RData

HSInumber: see HSICalLib_b1_b30_p.RData

RMSE_obs: Root mean squared error (obsI vs refI) tells how far obsI is from refI, RMSE=0 suggests surface orientation has no effect

RMSE_dIc: Root mean squared error (dIc vs refI) tells how far dIc is from refI, RMSE=0 suggests dI correction was successful

NSE_obs: Nash-Sutcliffe efficiency (obsI vs refI) tells if obsI is closer to mean refI or refI, NSE=1 suggests surface orientation has no effect

NSE_dIc: Nash-Sutcliffe efficiency (dIc vs refI) tells if dIc is closer to mean refI or refI, NSE=1 suggests dI correction was successful

KGE_obs: Kling-Gupta efficiency (obsI vs refI), same as NSE

KGE_obs_r: Pearson correlation coefficient (obsI vs refI), component of KGE

KGE_obs_beta: mean obsI / mean refI (obsI vs refI), component of KGE, ratio of the means of obsI and refI, beta=1 suggests surface orientation has no effect

KGE_obs_alpha: standard deviation(obsI) / standard deviation(refI), component of KGE, ratio of the standard deviations of obsI and refI, alpha=1 suggests surface orientation has no effect

KGE_dIc: Kling-Gupta efficiency (dIc vs refI), same as NSE

KGE_dIc_r: Pearson correlation coefficient (dIc vs refI), component of KGE

KGE_dIc_beta: mean dIc / mean refI (dIc vs refI), component of KGE, ratio of the means of dIc and refI, beta=1 suggests dI correction was successful

KGE_dIc_alpha: standard deviation(dIc) / standard deviation(refI), component of KGE, ratio of the standard deviations of dIc and refI, alpha=1 suggests dI correction was successful

# Related data files:

HSICalLib_20230613_globaldI_spectralstats_dIc_w_s_2.RData

HSICalLib_20230613_globaldI_spectralstats_dIc_w_a_2.RData

HSICalLib_20230613_spectralstats_coscor.RData

HSICalLib_20230613_spectralstats_ccor.RData

# R script that outputs this file:

HSICalLib_8b_dI_predict_global_final.R

# ----------

# Specific information for data file:

HSICalLib_20230613_globaldI_spectralstats_dIc_1.RData

# Name and type of R object: spectralstats_dIc_1 (numeric, data frame)

# Number of observations (rows): 48,149

# Description of observations:

Number of spectra compared when reflectance intensities are grouped this way before calculating objective functions. RMSE, NSE, and KGE are calculated for every soil sample at every configuration across all wavelengths. These dIc values were obtained by evaluating the dI multiple linear regression model (i.e., the one that includes ONLY slope, aspect, and wavelength as predictor variables).

# Missing data values (NA): None

# Number of variables (columns): 15

# Description of variables:

Same as HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

# Related data files:

HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

# R script that outputs this file:

HSICalLib_8b_dI_predict_global_final.R

# ----------

# Specific information for data file:

HSICalLib_20230613_globaldI_spectralstats_dIc_w_s_2.RData

# Name and type of R object: spectralstats_dIc_w_s_2 (data frame)

# Number of observations (rows): 2,826

# Description of observations:

Number of sample groups compared when reflectance intensities are grouped this way before calculating objective functions. RMSE, NSE, and KGE are calculated across all aspects and soil samples at each wavelength, slope combination (471 wavelengths * 6 slopes = 2,826 results for each objective function). Same as HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData BUT reflectance intensities were grouped in a different way before calculating objective functions.

# Missing data values (NA): None

# Number of variables (columns): 14

# Description of variables:

slope: see HSICalLib_b1_b30_p.RData

wavelength: see HSICalLib_wavevec.RData

RMSE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

RMSE_dIc: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

NSE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

NSE_dIc: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_obs_r: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_obs_beta: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_obs_alpha: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_dIc: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_dIc_r: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_dIc_beta: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_dIc_alpha: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

# Related data files:

HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

HSICalLib_20230613_globaldI_spectralstats_dIc_w_a_2.RData

# R script that outputs this file:

HSICalLib_8b_dI_predict_global_final.R

# ----------

# Specific information for data file:

HSICalLib_20230613_globaldI_spectralstats_dIc_w_a_2.RData

# Name and type of R object: spectralstats_dIc_w_a_2 (data frame)

# Number of observations (rows): 6,123

# Description of observations:

Number of sample groups compared when reflectance intensities are grouped this way before calculating objective functions. RMSE, NSE, and KGE are calculated across all slopes and soil samples at each wavelength, aspect combination (471 wavelengths * 13 aspects = 6,123 results for each objective function). Same as HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData BUT reflectance intensities were grouped in a different way before calculating objective functions.

# Missing data values (NA): None

# Number of variables (columns): 14

# Description of variables:

aspect: see HSICalLib_b1_b30_p.RData

wavelength: see HSICalLib_wavevec.RData

RMSE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

RMSE_dIc: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

NSE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

NSE_dIc: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_obs_r: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_obs_beta: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_obs_alpha: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_dIc: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_dIc_r: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_dIc_beta: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_dIc_alpha: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

# Related data files:

HSICalLib_20230613_globaldI_spectralstats_dIc_w_s_2.RData

# R script that outputs this file:

HSICalLib_8b_dI_predict_global_final.R

# ----------

# Specific information for data file:

HSICalLib_20230613_pga_slm_cosIL.RData

# Name and type of R object: pga_slm_cosIL (data frame)

# Number of observations (rows): obsI spectra = 48,149

# Description of observations:

Same as HSICalLib_20230223_b1-b30_pga.RData except the number of spectra was reduced by selecting spectra (using HSInumbers) from ONLY the 681 soil samples used in this study AND spectra collected at non-zero slopes.

# Missing data values (NA): None

# Number of variables (columns): 500

# Description of variables:

See HSICalLib_b1_b30_p.RData (486 columns) and HSICalLib_20230223_s0-s60_cosIL_all.RData (16 columns, slope and aspect are redundant). HSICalLib_20230223_b1-b30_pga.RData was subset by soil sample and slope, then merged with HSICalLib_20230223_s0-s60_cosIL_all.RData resulting in a “wide” data frame with all the same variables as HSICalLib_b1_b30_p.RData AND the constants needed for the cosine correction (from HSICalLib_20230223_s0-s60_cosIL_all.RData).

# Related data files:

HSICalLib_20230223_b1-b30_pga.RData

HSICalLib_20230223_s0-s60_cosIL_all.RData

# R script that outputs this file:

HSICalLib_9a_cosine_correction_final

# ----------

# Specific information for data file:

HSICalLib_20230613_spectralstats_coscor.RData

# Name and type of R object: spectralstats_coscor (numeric, data frame)

# Number of observations (rows): 48,149

# Description of observations:

Number of spectra compared when reflectance intensities are grouped this way before calculating objective functions. RMSE, NSE, and KGE are calculated for every soil sample at every configuration across all wavelengths. Same as HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData except corrected spectra result from cosine correction instead of dI correction.

# Missing data values (NA): None

# Number of variables (columns): 15

# Description of variables:

slope: see HSICalLib_b1_b30_p.RData

aspect: see HSICalLib_b1_b30_p.RData

HSInumber: see HSICalLib_b1_b30_p.RData

RMSE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

RMSE_coscor: Root mean squared error (coscorI vs refI) tells how far coscorI is from refI, RMSE=0 suggests cosine correction was successful

NSE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

NSE_coscor: Nash-Sutcliffe efficiency (coscorI vs refI) tells if coscorI is closer to mean refI or refI, NSE=1 suggests cosine correction was successful

KGE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_obs_r: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_obs_beta: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_obs_alpha: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_coscor: Kling-Gupta efficiency (coscorI vs refI), same as NSE

KGE_coscor_r: Pearson correlation coefficient (coscorI vs refI), component of KGE

KGE_coscor_beta: mean coscorI / mean refI (coscorI vs refI), component of KGE, ratio of the means of coscorI and refI, beta=1 suggests cosine correction was successful

KGE_coscor_alpha: standard deviation(coscorI) / standard deviation(refI), component of KGE, ratio of the standard deviations of coscorI and refI, alpha=1 suggests cosine correction was successful

# Related data files:

HSICalLib_20230613_spectralstats_ccor.RData

# R script that outputs this file:

HSICalLib_9a_cosine_correction_final

# ----------

# Specific information for data file:

HSICalLib_20230613_spectralstats_ccor.RData

# Name and type of R object: spectralstats_ccor.RData (numeric, data frame)

# Number of observations (rows): 48,149

# Description of observations:

Number of spectra compared when reflectance intensities are grouped this way before calculating objective functions. RMSE, NSE, and KGE are calculated for every soil sample at every configuration across all wavelengths. Same as HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData except corrected spectra result from C correction instead of dI correction.

# Missing data values (NA): None

# Number of variables (columns): 15

# Description of variables:

slope: see HSICalLib_b1_b30_p.RData

aspect: see HSICalLib_b1_b30_p.RData

HSInumber: see HSICalLib_b1_b30_p.RData

RMSE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

RMSE_ccor: Root mean squared error (ccorI vs refI) tells how far ccorI is from refI, RMSE=0 suggests C correction was successful

NSE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

NSE_ccor: Nash-Sutcliffe efficiency (ccorI vs refI) tells if ccorI is closer to mean refI or refI, NSE=1 suggests C correction was successful

KGE_obs: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_obs_r: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_obs_beta: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_obs_alpha: see HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

KGE_ccor: Kling-Gupta efficiency (ccorI vs refI), same as NSE

KGE_ccor_r: Pearson correlation coefficient (ccorI vs refI), component of KGE

KGE_ccor_beta: mean ccorI / mean refI (ccorI vs refI), component of KGE, ratio of the means of ccorI and refI, beta=1 suggests C correction was successful

KGE_ccor_alpha: standard deviation(ccorI) / standard deviation(refI), component of KGE, ratio of the standard deviations of ccorI and refI, alpha=1 suggests C correction was successful

# Related data files:

HSICalLib_20230613_spectralstats_coscor.RData

# R script that outputs this file:

HSICalLib_10a_C_correction_final

# ----------

# Specific information for data file:

HSICalLib_20230613_globaldI_OCpredict_obsI_p.RData

# Name and type of R object: obsI_p (“wide” data frame)

# Number of observations (rows): 48,149

# Description of observations:

Same observations as HSICalLib_20230613_pga_slm_cosIL.RData except different columns are included along with the 471 observed (uncorrected) reflectance intensities (obsI) for each soil sample at each orientation.

# Missing data values (NA): None

# Number of variables (columns): 475

# Description of variables:

obsI*[wavelength]*: see HSICalLib_b1_b30_p.RData

slope: see HSICalLib_b1_b30_p.RData

aspect: see HSICalLib_b1_b30_p.RData

HSInumber: see HSICalLib_b1_b30_p.RData

log10volC: see HSICalLib_b1_b30_p.RData

# Related data files:

HSICalLib_20230613_globaldI_OCpredict_refI_p.RData

HSICalLib_20230613_globaldI_OCpredict_dIc_p.RData

# R script that outputs this file:

HSICalLib_12a_OC_predict_pls_final.R

# ----------

# Specific information for data file:

HSICalLib_20230613_globaldI_OCpredict_refI_p.RData

# Name and type of R object: refI_p (“wide” data frame)

# Number of observations (rows): 48,149

# Description of observations:

Same observations as HSICalLib_20230613_pga_slm_cosIL.RData except different columns are included along with the 471 reference reflectance intensities (refI) for each soil sample at each orientation.

# Missing data values (NA): None

# Number of variables (columns): 475

# Description of variables:

refI*[wavelength]*: same as HSICalLib_b1_b30_p.RData except these are reference reflectance intensities rather than obsI

slope: see HSICalLib_b1_b30_p.RData

aspect: see HSICalLib_b1_b30_p.RData

HSInumber: see HSICalLib_b1_b30_p.RData

log10volC: see HSICalLib_b1_b30_p.RData

# Related data files:

HSICalLib_20230613_globaldI_OCpredict_obsI_p.RData

HSICalLib_20230613_globaldI_OCpredict_dIc_p.RData

# R script that outputs this file:

HSICalLib_12a_OC_predict_pls_final.R

# ----------

# Specific information for data file:

HSICalLib_20230613_globaldI_OCpredict_dIc_p.RData

# Name and type of R object: dIc_p (“wide” data frame)

# Number of observations (rows): 48,149

# Description of observations:

Same observations as HSICalLib_20230613_pga_slm_cosIL.RData except different columns are included along with the 471 dI corrected reflectance intensities (dIc) for each soil sample at each orientation.

# Missing data values (NA): None

# Number of variables (columns):

# Description of variables:

Number of columns/variables = 475

dIc*[wavelength]*: same HSICalLib_b1_b30_p.RData except these reflectance intensities have been dI corrected using the dI+ multiple linear regression model to predict dI

slope: see HSICalLib_b1_b30_p.RData

aspect: see HSICalLib_b1_b30_p.RData

HSInumber: see HSICalLib_b1_b30_p.RData

log10volC: see HSICalLib_b1_b30_p.RData

# Related data files:

HSICalLib_20230613_globaldI_OCpredict_refI_p.RData

HSICalLib_20230613_globaldI_OCpredict_obsI_p.RData

# R script that outputs this file:

HSICalLib_12a_OC_predict_pls_final.R

# ----------

# Specific information for data file:

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

# Name and type of R object: sstatdf_tr_ref (numeric, data frame)

# Number of observations (rows): 681

# Description of observations:

Each row contains error metrics for log10volC predictions made from the reference spectrum for each soil sample. This partial least squares regression model was trained on these same 681 reference spectra (1 for each soil sample).

# Missing data values (NA): None

# Number of variables (columns): 12

# Description of variables:

predicted: log10volC predicted by the “reference” partial least squares regression model using 471 reference reflectance intensities (refI) as predictor variables

observed: log10volC observed based on laboratory measurements of SOC

slope: see HSICalLib_b1_b30_p.RData

aspect: see HSICalLib_b1_b30_p.RData

NOTE: slope and aspect are the same for all samples (rows) in this data frame because refI is the same regardless of orientation

HSInumber: see HSICalLib_b1_b30_p.RData

RMSE: Root mean squared error (observed log10volC vs predicted log10volC) tells how far predicted is from observed, smaller RMSE means better prediction, ideal RMSE=0

NSE: Nash-Sutcliffe efficiency (observed log10volC vs predicted log10volC) tells if predicted is closer to mean observed or observed, NSE=1 means the model is a perfect predictor, NSE<0 means predicted is closer to mean observed than observed

R2: Coefficient of determination (observed log10volC vs predicted log10volC)

KGE: Kling-Gupta efficiency (observed log10volC vs predicted log10volC), same as NSE

KGE_r: Pearson correlation coefficient (observed log10volC vs predicted log10volC)

KGE_beta: mean predicted / mean observed (observed log10volC vs predicted log10volC), component of KGE, ideal beta=1

KGE_alpha: standard deviation(predicted) / standard deviation(observed), component of KGE, ideal alpha=1

# Related data files:

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData

# R script that outputs this file:

HSICalLib_12a_OC_predict_pls_final.R

# ----------

# Specific information for data file:

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData

# Name and type of R object: sstatdf_obs

# Number of observations (rows): 48,149

# Description of observations:

Rows contain error metrics for log10volC predictions made from all of the observed (uncorrected) spectra collected at non zero slopes for each soil sample (1 prediction per spectrum means multiple log10volC predictions for each soil sample). This partial least squares regression model was trained on 681 reference spectra (1 for each soil sample).

# Missing data values (NA): None

# Number of variables (columns): 12

# Description of variables:

predicted: log10volC predicted by the “reference” partial least squares regression model using 471 observed (uncorrected) reflectance intensities (obsI) as predictor variables

observed: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

slope: see HSICalLib_b1_b30_p.RData

aspect: see HSICalLib_b1_b30_p.RData

HSInumber: see HSICalLib_b1_b30_p.RData

RMSE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

NSE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

R2: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

KGE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

KGE_r: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

KGE_beta: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

KGE_alpha: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

# Related data files:

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData

# R script that outputs this file:

HSICalLib_12a_OC_predict_pls_final.R

# ----------

# Specific information for data file:

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData

# Name and type of R object: sstatdf_tr_dIc (data frame)

# Number of observations (rows): 681

# Description of observations:

Rows contain error metrics for log10volC predictions made from the training dIc spectra (1 prediction and 1 spectrum per sample from a randomly chosen orientation). This partial least squares regression model was trained on the same 681 dIc spectra.

# Missing data values (NA): None

# Number of variables (columns): 12

# Description of variables:

Same as HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except this model was trained and evaluated using the “corrected” PLSR model and dI corrected training spectra.

predicted: log10volC predicted by evaluating the “corrected” partial least squares regression model using 471 dI corrected reflectance intensities (dIc) as predictor variables. Only spectra from the “corrected” PLSR model training set were evaluated.

observed: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

slope: see HSICalLib_b1_b30_p.RData

aspect: see HSICalLib_b1_b30_p.RData

HSInumber: see HSICalLib_b1_b30_p.RData

RMSE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

NSE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

R2: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

KGE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

KGE_r: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

KGE_beta: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

KGE_alpha: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

# Related data files:

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_dIc.RData

# R script that outputs this file:

HSICalLib_12a_OC_predict_pls_final.R

# ----------

# Specific information for data file:

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_dIc.RData

# Name and type of R object: sstatdf_dIc (numeric, data frame)

# Number of observations (rows): 48,149

# Description of observations:

Rows contain error metrics for log10volC predictions made from dI corrected reflectance spectra from all orientations for all soil samples using the “corrected” partial least squares regression model that was trained on 681 dI corrected spectra (1 per soil sample from a randomly chosen orientation). Multiple predictions are made for each soil sample since more than 1 spectrum per sample is evaluated.

# Missing data values (NA): None

# Number of variables (columns): 12

# Description of variables:

Same as HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData except this model was trained and evaluated using the “corrected” PLSR model and dI corrected spectra.

predicted: log10volC predicted by evaluating the “corrected” partial least squares regression model using 471 dI corrected reflectance intensities (dIc) from all soil samples at all orientations as predictor variables. More than 1 spectrum per soil sample is evaluated so multiple predictions are made for each soil sample.

observed: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

slope: see HSICalLib_b1_b30_p.RData

aspect: see HSICalLib_b1_b30_p.RData

HSInumber: see HSICalLib_b1_b30_p.RData

RMSE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

NSE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

R2: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

KGE: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

KGE_r: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

KGE_beta: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

KGE_alpha: see HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

# Related data files:

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData

# R script that outputs this file:

HSICalLib_12a_OC_predict_pls_final.R

# ----------

# Specific information for data file:

HSICalLib_202300613_globaldI_PLSR_log10volC_summarystats_OC.RData

# Name and type of R object: summarystats_OC (data frame, numeric)

# Number of observations (rows): 78

# Description of observations:

Error metrics for all 78 (6 slopes * 13 aspects) non-zero orientations across all predictions made by 1) evaluating the “reference” PLSR model on all observed (obsI) spectra (same results used in HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData), and 2) evaluating the “corrected” PLSR model on all dI corrected spectra (same results used in HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_dIc.RData).

# Missing data values (NA): None

# Number of variables (columns): 16

# Description of variables:

Same as HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except error metrics for this data frame were calculated across all predictions made at each orientation from either 1) all obsI vs training refI spectra using the reference PLSR model, or 2) all dIc vs training dIc spectra using the corrected PLSR model.

slope: see HSICalLib_b1_b30_p.RData

aspect: see HSICalLib_b1_b30_p.RData

RMSE_obs: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except calculated across all soil samples at each orientation, predicted = “reference” PLSR model evaluated using all obsI spectra, observed = “reference” PLSR model evaluated using training refI spectra

RMSE_dIc: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData except calculated across all soil samples at each orientation, predicted = “corrected” PLSR model evaluated using all dIc spectra, observed = “corrected” PLSR model evaluated using training dIc spectra

NSE_obs: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except calculated across all soil samples at each orientation, predicted = “reference” PLSR model evaluated using all obsI spectra, observed = “reference” PLSR model evaluated using training refI spectra

NSE_dIc: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData except calculated across all soil samples at each orientation, predicted = “corrected” PLSR model evaluated using all dIc spectra, observed = “corrected” PLSR model evaluated using training dIc spectra

R2_obs: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except calculated across all soil samples at each orientation, predicted = “reference” PLSR model evaluated using all obsI spectra, observed = “reference” PLSR model evaluated using training refI spectra

R2_dIc: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData except calculated across all soil samples at each orientation, predicted = “corrected” PLSR model evaluated using all dIc spectra, observed = “corrected” PLSR model evaluated using training dIc spectra

KGE_obs: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except calculated across all soil samples at each orientation, predicted = “reference” PLSR model evaluated using all obsI spectra, observed = “reference” PLSR model evaluated using training refI spectra

KGE_dIc: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData except calculated across all soil samples at each orientation, predicted = “corrected” PLSR model evaluated using all dIc spectra, observed = “corrected” PLSR model evaluated using training dIc spectra

KGE_r_obs: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except calculated across all soil samples at each orientation, predicted = “reference” PLSR model evaluated using all obsI spectra, observed = “reference” PLSR model evaluated using training refI spectra

KGE_r_dIc: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData except calculated across all soil samples at each orientation, predicted = “corrected” PLSR model evaluated using all dIc spectra, observed = “corrected” PLSR model evaluated using training dIc spectra

KGE_beta_obs: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except calculated across all soil samples at each orientation, predicted = “reference” PLSR model evaluated using all obsI spectra, observed = “reference” PLSR model evaluated using training refI spectra

KGE_beta_dIc: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData except calculated across all soil samples at each orientation, predicted = “corrected” PLSR model evaluated using all dIc spectra, observed = “corrected” PLSR model evaluated using training dIc spectra

KGE_alpha_obs: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData except calculated across all soil samples at each orientation, predicted = “reference” PLSR model evaluated using all obsI spectra, observed = “reference” PLSR model evaluated using training refI spectra

KGE_alpha_dIc: See HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData except calculated across all soil samples at each orientation, predicted = “corrected” PLSR model evaluated using all dIc spectra, observed = “corrected” PLSR model evaluated using training dIc spectra

# Related data files:

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_dIc.RData

# R script that outputs this file:

HSICalLib_12a_OC_predict_pls_final.R

# ---------------------------------------------------------------------------

# Sharing/Access information

# ---------------------------------------------------------------------------

# ----------

Links to other publicly accessible locations of the data:

github.com/aduro005/HSITopographicCorrectionSampleWellArray (custom designed and 3-D printed sample array)

github.com/aduro005/HSITopographicCorrectionRscripts (R scripts used to manipulate the data found on the UCR Dryad Data Repository)

# ----------

Data was derived from the following sources:

data.neonscience.org/home (574 soil samples obtained from NEON Initial Characterization Soils Archive at the University of Michigan Biological StationSample Archive Facility in Ehlers (UMBS-SAFE) with accompanying soil properties data obtained from the NEON Data Archive)

# ---------------------------------------------------------------------------

# Description of the R scripts

# ---------------------------------------------------------------------------

The titles of the R scripts indicate the order in which they are meant to be used. The only exception to this convention is the HSICalLib_0_FinalPlots.R script which could be used at different points during the workflow, but is intended to be used last.

# ----------

List of files in the R scripts folder:

HSI Data Processing:

HSICalLib_1a_hdr_raw_to_hsi_rgbmat.R

HSICalLib_1b_rgbmat_to_soilindices.R

HSICalLib_1c_rgbmat_soilindices_to_Tsoilindices.R

HSICalLib_1d_rgbmat_Tsoilindices_to_adjustedTsoilindices.R

HSICalLib_2_soilindices_hsi_to_intensities.R

HSICalLib_3_intensities_to_intmean_intsd.R

HSICalLib_4_intmean_to_masterintmean_to_p.R

HSICalLib_5a_p_obsI_cleaning.R

HSICalLib_5b_p_dI_cleaning.R

HSICalLib_6a_aspect_correction.R

HSICalLib_6b_cosIL_calculation.R

HSICalLib_7a_observedI.R

HSICalLib_7b_referenceI.R

HSICalLib_7c_dI_calculation.R

HSI Data Analysis:

HSICalLib_8b_dI_predict_global_final.R

HSICalLib_9a_cosine_correction_final.R

HSICalLib_10a_C_correction_final.R

HSICalLib_12a_OC_predict_pls_final.R

HSICalLib_0_FinalPlots.R

# ----------

# Specific information for R script:

HSICalLib_1a_hdr_raw_to_hsi_rgbmat.R

# Description of script:

Read in raw data, dark calibration, and white calibration files from the Data folder (3 .hdr and 3 .raw files), perform white and dark correction, then output .RData and .tiff files to the Output Files folder for each scan. Output files contain the hyperspectral image of the scan (reflectance intensity for 471 wavebands between 400 - 1000 nm in 2 spatial dimensions), an RGB image of the scan (reflectance intensity for red, green, and blue wavebands for each pixel in the image), and a .tiff of the RGB image (can be opened in a photo viewer).

# Input files:

2 raw data files (.hdr and .raw) per HSI scan (not included in this dataset)

hsi_HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*.hdr

hsi_HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*.raw

2 raw data files (.hdr and .raw) for white calibration (not included in this dataset)

hsi_HSICalLib_b*[batch]*_*[date]*_white.hdr

hsi_HSICalLib_b*[batch]*_*[date]*_white.raw

2 raw data files (.hdr and .raw) for dark calibration (not included in this dataset)

hsi_HSICalLib_b*[batch]*_*[date]*_dark.hdr

hsi_HSICalLib_b*[batch]*_*[date]*_dark.raw

# Output files:

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_hsi.RData (not included in this dataset)

Array with the same dimensions as the .raw file, but the raw reflectance intensities have been scaled between 0 (dark, minimum reflectance) and 1 (white, maximum reflectance) using white and dark calibration scan data.

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_rgbmat.RData (not included in this dataset)

Array with the same spatial dimensions as the .raw file but the spectral dimension only contains data for 3 wavebands corresponding to the red, green, and blue color wavelengths. This array is used to generate the RGB.tiff file.

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_mmResolution_RGB.tiff (not included in this dataset)

RGB image of the scan

# ----------

# Specific information for R script:

HSICalLib_1b_rgbmat_to_soilindices.R

# Description of script:

Make a “template” with the row and column coordinates for every pixel occurring within the 40 sample wells at all 98 slope, aspect configurations. This script can be used to manually identify the location (row, column coordinates) of pixels occurring within sample wells from images. The result of this is a “template” which can be used to automatically identify the location (row, column coordinates) of pixels occurring within sample wells from any image given the slope and aspect of the sample well array in the image AND isolate the reflectance spectra from those pixels.

# Input files:

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_mmResolution_RGB_rgbmat.RData (not included in this dataset)

Output from R script 1a.

# Output files:

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_soilindices.RData (not included in this dataset)

List containing 40 elements (1 for each sample well/soil sample). Each element of the list contains a data frame with 2 columns “soilrows” and “soilcols”. These are the row, column coordinates (i.e., the 2 spatial dimensions of “rgbmat” or “hsi” arrays) of pixels occurring within each of the 40 sample wells (i.e., the 40 elements in this list) at all 98 orientations. There is a separate file for each configuration, and the configuration (i.e., slope and aspect) is indicated in the file name. These coordinates are later used to isolate spectra from ONLY the areas of the images that correspond to sample wells AND to match reflectance spectra from sample wells to the correct soil sample and its properties.

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_mmResolution_RGB_soilindices.tiff (not included in this dataset)

RGB image of the scan with pixels corresponding to sample wells turned some color. These images were used to visually check that the row, column coordinates (indicated by HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_soilindices.RData) were correctly aligned with actual locations of sample wells in images.

# ----------

# Specific information for R script:

HSICalLib_1c_rgbmat_soilindices_to_Tsoilindices.R

# Description of script:

Use “soilindices” list (created in R script 1b) along with “rgbmat” arrays to identify pixels occurring in sample wells and turn those pixels a certain color using the row, column indices (pixels) indicated by soilindices.RData or Tsoilindices.RData (i.e., the “soilindices” list). Then output the updated “soilindices” list as “_Tsoilindices.RData” and an image called “_Tsoilindices.tiff” where sample well pixels are turned some color (i.e., certain values are manually assigned to the red, green, and blue wavebands for pixels occurring in sample wells).\

# Input files:

HSICalLib_b*[1]*_*date[1]*_s*[slope]*_a*[aspect]*_soilindices.RData (not included in this dataset) OR

HSICalLib_b*[4]*_*date[4]*_s*[slope]*_a*[aspect]*_Tsoilindices.RData (not included in this dataset)

Output from R script 1a

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_rgbmat.RData (not included in this dataset)

Output from R script 1a

# Output files:

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_mmResolution_RGB_Tsoilindices.tiff (not included in this dataset)

Same as _soilindices.tiff (output from 1b) except the row and column indices occurring in sample wells have been automatically selected based on the “master” templates that were manually created for each orientation using batch 1 and 4 (R scripts not included in this dataset).

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_Tsoilindices.RData (not included in this dataset)

Same as _soilindices.RData (output from 1b) except the row and column indices corresponding to sample wells have been automatically selected based on the “master” templates that were manually created for each orientation using batch 1 and 4 (R scripts not included in this dataset).

# ----------

# Specific information for R script:

HSICalLib_1d_rgbmat_Tsoilindices_to_adjustedTsoilindices

# Description of script:

Use “_Tsoilindices.RData” (“soilindices” list) (created in R script 1c) along with “rgbmat” arrays to MANUALLY adjust the location of pixels occurring in sample wells and turn those pixels some color using the row, column indices (pixels) indicated by “_Tsoilindices.RData” AND visual inspection by a user in R. Then output the updated “soilindices” list as “_Tsoilindices.RData” and “_Tsoilindices.tiff”.

# Input files:

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_rgbmat.RData (not included in this dataset)

Output from R script 1a

HSICalLib_b*[4]*_*date[4]*_s*[slope]*_a*[aspect]*_Tsoilindices.RData (not included in this dataset)

Output from R script 1c

# Output files:

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_Tsoilindices.RData (not included in this dataset)

Same as the “soilindices” list (output from R script 1b and 1c) except the locations of pixels occurring within sample wells have been adjusted based on visual inspection by a user.

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_mmResolution_RGB_Tsoilindices.tiff (not included in this dataset)

Same as the RGB image output from R script 1b and 1c except the locations of pixels occurring within sample wells have been adjusted based on visual inspection by a user.

# ----------

# Specific information for R script:

HSICalLib_2_soilindices_hsi_to_intensities.R

# Description of script:

Use “_Tsoilindices.RData” and “_hsi.RData” to isolate reflectance spectra (i.e., reflectance intensities measured at 471 wavebands) from pixels (i.e., row, column coordinates indicated by “_Tsoilindices.RData”) corresponding to sample wells (soil samples) in “_hsi.RData” (output from R script 1a).

# Input files:

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_Tsoilindices.RData (not included in this dataset)

Output from R script 1d

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_hsi.RData (not included in this dataset)

Output from R script 1a

# Output files:

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_hsi_intensities.RData (not included in this dataset)

List containing 40 elements (1 for each sample well/soil sample). Each element of the list contains a data frame where each row is a reflectance spectrum from 1 pixel occurring within a sample well. For example, the first element of the list contains reflectance spectra from all the pixels occurring within the first sample well.

Number of rows = number of pixels occurring within this sample well

Number of columns = 471 reflectance intensities

NOTE: the wavelengths corresponding to these 471 reflectance intensities can be found in wavevec.RData ().

# ----------

# Specific information for R script:

HSICalLib_3_intensities_to_intmean_intsd.R

# Description of script:

Use “_intensities.RData” (output from R script 2) to get the average reflectance spectrum of each soil sample. In other words, get the mean and sd of reflectance intensities measured at each waveband across all pixels occurring within each sample well/soil sample.

# Input files:

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_hsi_intensities.RData (not included in this dataset)

Output from R script 2

# Output files:

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_intmean.RData (not included in this dataset)

Data frame where each row contains the average reflectance spectrum for a soil sample/sample well for this batch at this configuration (also see _intensities.RData file name for batch, slope, and aspect info). Each row is the average reflectance spectrum for a sample/well. Each file corresponds to a single scan (total number of scans = 30 batches x 98 configurations).

Number of rows/samples = 40

Number of columns/variables = 475

471 reflectance intensities: See Description of the data

slope: See Description of the data

aspect: See Description of the data

batch: See Description of the data

well: See Description of the data

NOTE: Batch and Well were used together as a key to merge soil sample properties data with reflectance data in R script 4. Each soil sample has 98 reflectance spectra (1 obtained at each slope, aspect configuration) but only 1 set of properties data. The chemical and physical properties of a soil sample don’t change as the sample orientation changes, but reflectance does (as shown in this study).

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_intsd.RData (not included in this dataset)

Same as _intmean.RData, but standard deviation of reflectance at each waveband is reported rather than the mean.

# ----------

# Specific information for R script:

HSICalLib_4_intmean_to_masterintmean_to_p.R

# Description of script:

Bring in soil properties data from 4 separate sources, then merge these data frames, and output a single data frame called HSICalLib_b1_b30_prep_rutgers_neon_fire.RData which contains soil properties data for all soil samples in the study.

Bring in _intmean.RData (output from R script 3) for each scan (98 orientations/scans per batch), then output a single data frame containing the mean reflectance spectra for each soil sample at 98 configurations for ONLY this batch. This file is similar to _intmean.RData except the separate _intmean.RData files for each scan are combined into a single _intmean.RData file for each batch.

Bring in _intmean.RData for each batch (output from this script) and merge into a single data frame called “_masterintmean.RData” containing reflectance spectra for all soil samples at all orientations.

Merge _masterintmean.RData (output from this script) with _prep_rutgers_neon_fire.RData (output from this script) resulting in a data frame called “p” with reflectance spectra from all soil samples at all orientations along with selected soil properties data.

# Input files:

HSICalLib_20230418_SamplePrepData_R.csv (not included in this dataset)

Soil sample properties data provided by the Pedology Lab at UC Riverside

Created in Google Sheets by Alyssa Duro

HSICalLib_20230418_Fire_R.csv (not included in this dataset)

Soil sample properties data provided by the Gray Lab at UC Riverside

Created in Google Sheets by Alyssa Duro

HSICalLib_20230418_NEON_R.csv (not included in this dataset)

Soil sample properties data provided by NEON (NRCS performed lab analysis)

Created in Google Sheets by Alyssa Duro

HSICalLib_20230418_Rutgers_R.csv (not included in this dataset)

Soil sample properties data provided by Rutgers (samples are from Duke Farms)

Created in Google Sheets by Alyssa Duro

HSICalLib_b*[batch]*_*[date]*_s*[slope]*_a*[aspect]*_intmean.RData (not included in this dataset)

Output from R script 3

# Output files:

HSICalLib_b1_b30_prep_rutgers_neon_fire.RData (not included in this dataset)

Merged _SamplePrepData_R.RData, _Fire_R.RData, _NEON_R.RData, and _Rutgers_R.RData resulting in a single data frame with 1180 rows (soil samples) and 58 columns/variables (measured soil properties and sample identifiers). Many soil properties data available for some soil samples were not available for all soil samples resulting in 32,727 NA’s.

Number of rows/soil samples = 1180

Number of columns/variables = 58

HSICalLib_b*[batch]*_*[date]*_intmean.RData (not included in this dataset)

Same as _intmean.RData (output from R script 3) except now each soil sample (i.e., each batch, well combination) is associated with 98 different reflectance spectra, each with a different combination of slope and aspect). Each file corresponds to a single batch (total number of batches = 30).

Number of rows/soil samples/reflectance spectra = 3920

40 sample wells (soil samples) * 98 configurations

Number of columns/variables = 475

Same variables as _intmean.RData (output from R script 3)

HSICalLib_b1_b30_masterintmean.RData (not included in this dataset)

Same as _intmean.RData (output from R script 3) except now each soil sample is associated with 98 reflectance spectra collected at different slope, aspect combinations.

Number of rows/soil samples/reflectance spectra = 117,600

40 soil samples * 30 batches * 98 configurations

Number of columns/variables = 475

Same as _intmean.RData (outputs from R script 3 and 4)

HSICalLib_b1_b30_p.RData

A data frame with mean reflectance spectra for each soil sample at each orientation along with selected soil properties data and sample identifiers. NA’s occur where soil properties data are not available for a soil sample. This is the most raw form of the data included in this dataset.

Number of rows/spectra = 115,444 reflectance spectra

1178 soil samples * 98 configurations

Number of columns/variables = 486

471 observed (uncorrected) reflectance intensities

slope: See Description of the data

aspect: See Description of the data

batch: Soil samples were imaged in groups of 40 at a time

well: Indexed location of the soil sample in the sample well array

HSInumber: unique soil sample identifier

HSIPackedDensity: mass of soil sample per volume of sample well

sandTotal: % sand (only available for samples from the NEON archive)

siltTotal: % silt (only available for samples from the NEON archive)

clayTotal: % clay (only available for samples from the NEON archive)

OC: % soil organic carbon (by weight)

archive: source of the soil sample and soil properties data

adod: air dried soil mass / oven dried soil mass

volC: % soil organic carbon (by volume)

log10volC: log10(volC)

batchwellID: unique reflectance spectra identifier

# ----------

# Specific information for R script:

HSICalLib_5a_p_obsI_cleaning.R

# Description of script:

Remove reflectance spectra reflectance spectra that are not truly representative of soil samples based on visual identification of imaging errors then output as _p_clean.RData (not included in this dataset) OR output this data frame as a “long” version (where wavelength is a variable) called _p_clean_melt.RData (not included in this dataset).

Then remove reflectance spectra that are not truly representative of soil samples by removing spectra containing unusually large or small observed intensities (obsI) and output this data frame as _p_clean_obsI.RData ().

# Input files:

HSICalLib_wavevec.RData

HSICalLib_b1_b30_p.RData

(output from R script 4)

# Output files:

HSICalLib_b1-b30_20230223_p_clean.RData (not included in this dataset)

Same as _p.RData (output from R script 4) except some known (visually identified) imaging mistakes (rows/spectra) have been removed. Details are provided as comments in the R script.

Number of rows/spectra = 114,764

Number of columns/variables = 486

Same as _p.RData (output from R script 4)

HSICalLib_20230223_b1-b30_p_clean_melt.RData (not included in this dataset)

“Long” version of _p_clean.RData where wavelength is a variable

Number of rows/observed reflectance intensities (obsI) = 54,053,844

114,764 spectra * 471 wavebands

Number of columns/variables = 8

slope: See Description of the data

aspect: See Description of the data

batch: Soil samples were imaged in groups of 40 at a time

well: Indexed location of the soil sample in the sample well array

HSInumber: unique soil sample identifier

wavelength: wavelength corresponding to each obsI reflectance intensity

obsI: observed reflectance intensities ()

batchwellID: unique reflectance spectra identifier

HSICalLib_20230223_b1-b30_obsIoutliers.RData (not included in this dataset)

Character vector containing the “batchwellID” (a sample identifier unique to each reflectance spectrum) for the reflectance spectra identified as imaging errors using the observed intensities (obsI) approach.

Length/number of spectra to be removed based on obsI values = 115

HSICalLib_b1-b30_20230223_p_clean_obsI.RData

Same as _p_clean.RData (output from this script) except image mistakes have been identified (see _obsIoutliers.RData) and removed based on unusually large or small observed intensities (obsI).

Number of rows/obsI spectra = 114,649

Number of columns/variables = 486

Same as _p.RData (output from R script 4)

HSICalLib_20230223_b1-b30_p_clean_obsI_melt.RData (not included in this dataset)

“Long” version of _p_clean_obsI.RData (output from this script) where wavelength is a variable

Number of rows/observed reflectance intensities (obsI) = 53,999,679

114,649 spectra * 471 wavebands

Number of columns/variables = 8

Same as _p_clean_melt.RData (output from this script)

# ----------

# Specific information for R script:

HSICalLib_5b_p_dI_cleaning.R

# Description of script:

Remove reflectance spectra reflectance spectra that are not truly representative of soil samples (due to imaging errors) if they contain unusually large or small change in intensities (dI or ΔI) values. These dI values () are the difference between the obsI and reference intensities (refI).

# Input files:

HSICalLib_20230223_b1-b30_p_clean_obsI.RData

(output from R script 5a)

HSICalLib_wavevec.RData

# Output files:

HSICalLib_20230223_b1-b30_p_clean_obsI_dI.RData (not included in this dataset)

Same as _p_clean_obsI.RData (output from R script 5a) except 471 dI values (1 per wavelength) are reported rather than 471 observed reflectance intensities (obsI).

Number of rows/dI spectra = 114,649 (same as _p_clean_obsI.RData)

Number of columns/variables = 486

Same as _p.RData (output from R script 4).

NOTE: These are 471 dI reflectance intensities NOT obsI ()

HSICalLib_20230223_b1-b30_p_clean_obsI_dI_melt.RData (not included in this dataset)

“Long” version of _p_clean_obsI_dI.RData where wavelength is a variable

Number of rows/dI reflectance intensities = 53,999,679

114,649 spectra * 471 wavebands

Number of columns/variables = 8

slope: See Description of the data

aspect: See Description of the data

batch: Soil samples were imaged in groups of 40 at a time

well: Indexed location of the soil sample in the sample well array

HSInumber: unique soil sample identifier

wavelength: wavelength corresponding to each dI reflectance intensity

dI: the difference between obsI and refI

batchwellID: unique reflectance spectra identifier

HSICalLib_20230223_b1-b30_dIoutliers.RData (not included in this dataset)

Character vector containing the “batchwellID” (a sample identifier unique to each reflectance spectrum) for the reflectance spectra identified as imaging errors using the change/difference in intensities (dI) approach.

Length/number of spectra to be removed based on dI values = 7188

HSICalLib_b1-b30_20230223_p_gold.RData

Same as _p.RData (output from R script 4) except the reflectance spectra with unusually large or small obsI OR dI values have been identified (see _dIoutliers.RData and _obsIoutliers.RData) and removed as imaging errors.

Number of rows/obsI reflectance spectra = 107,486

Number of columns/variables = 486

Same as _p.RData (output from R script 4)

# ----------

# Specific information for R script:

HSICalLib_6a_aspect_correction.R

# Description of script:

Convert aspect (column) values 195, 210, 225, 240, 255, and 270 in _p_gold.RData (output from R script 5b) to 165, 150, 135, 120, 105, and 90 then output as _p_gold_acor.RData (wide version) and _p_gold_acor_melt.RData (long version).

# Input files:

HSICalLib_wavevec.RData

HSICalLib_b1-b30_20230223_p_gold.RData

(output from R script 5b)

# Output files:

HSICalLib_20230223_b1-b30_p_gold_acor.RData

Same as _p_gold.RData (output from R script 5b) except some of the aspect values have been converted.

Number of rows/obsI reflectance spectra = 99,804

Number of columns/variables = 486

Same as _p.RData (output from R script 4)

HSICalLib_20230223_b1-b30_p_gold_acor_melt.RData

“Long” version of _p_gold_acor.RData where wavelength is a variable

Number of rows/dI reflectance intensities = 47,007,684

107,486 spectra * 471 wavebands

Number of columns/variables = 5

slope:

aspect:

HSInumber: unique soil sample identifier

wavelength: wavelength corresponding to each obsI reflectance intensity

obsI: observed reflectance intensities ()

# ----------

# Specific information for R script:

HSICalLib_6b_cosIL_calculation.R

# Description of script:

Calculate values needed for the theoretical “cosine correction” based on measurements of the HSI setup used in this study. Details are provided as comments in the R script.

# Input files:

None

# Output files:

HSICalLib_20230223_s0-s60_cosIL_all.RData

Number of rows/orientations = 91

Number of columns/variables = 16

slope: See Description of the data

aspect: See Description of the data

z1: zenith angle (degrees) between light bank 1 and the HSI camera,

NOTE: zenith varies with slope, light bank 1 = N = 0 azimuth

z2: zenith angle (degrees) between light bank 2 (S) and the HSI camera,

NOTE: zenith varies with slope, light bank 2 = S = 180 azimuth

meanz: average of z1 and z2

cosz1: cos(z1)

cosz2: cos(z2)

cosmeanz: cos(meanz)

meancosz: average of cos(z1) and cos(z2)

cosIL1: cos( illumination angle (IL) ) light bank 1

= cos(z1)*cos(slope) + sin(z1)*sin(slope)*cos(azimuth-aspect)

cosIL2: cos( illumination angle (IL) ) light bank 2

= cos(z2)*cos(slope) + sin(z2)*sin(slope)*cos(azimuth-aspect)

meancosIL: average of cosIL1 and cosIL2

r1: cos(z1) / cosIL1

r2: cos(z2) / cosIL2

rmeans: cos(meanz) / meancosIL

rcosmeans: meancosz / meancosIL

# ----------

# Specific information for R script:

HSICalLib_7a_observedI.R

# Description of script:

Remove any remaining NA’s introduced during aspect correction in R script 6a, then output a final wide and long version of the obsI () spectra for the soil samples used in this study.

# Input files:

HSICalLib_wavevec.RData

HSICalLib_20230223_b1-b30_p_gold_acor.RData

(output from R script 6a)

# Output files:

HSICalLib_20230223_b1-b30_pga.RData

Number of rows/obsI spectra = 99,537

Number of columns/variables = 486

Same as _p.RData (output from R script 4)

HSICalLib_20230223_b1-b30_pga_melt.RData

“Long” version of _pga.RData where wavelength is a variable

Number of rows/obsI reflectance intensities = 46,881,927

99,537 spectra * 471 wavebands

Number of columns/variables = 5

Same as _p_gold_acor_melt.RData (output from R script 6a)

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

wavelength: wavelength corresponding to each obsI reflectance intensity

obsI: observed reflectance intensities ()

# ----------

# Specific information for R script:

HSICalLib_7b_referenceI.R

# Description of script:

Bring in _pga.RData (output from R script 7a), then calculate and output a final wide and long version of the reference intensities (refI) () spectra for the soil samples used in this study. NOTE: refI spectra are the same for each configuration.

# Input files:

HSICalLib_wavevec.RData

HSICalLib_20230223_b1-b30_pga.RData

(output from R script 7a)

# Output files:

HSICalLib_20230223_b1-b30_pga_refI.RData

Data frame with the same dimensions as _pga.RData (output from R script 7a) except reference intensities (refI) () are reported instead of obsI.

Number of rows/refI spectra = 99,537

Number of columns/variables = 486

Same as _p.RData (output from R script 4)

NOTE: These are 471 refI reflectance intensities NOT obsI ()

HSICalLib_20230223_b1-b30_pga_refI_melt.RData

“Long” version of _pga_refI.RData where wavelength is a variable

Number of rows/refI reflectance intensities = 46,881,927

99,537 spectra * 471 wavebands

Number of columns/variables = 5

Same as _p_gold_acor_melt.RData (output from R script 6a)

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

wavelength: wavelength corresponding to each obsI reflectance intensity

refI: reference reflectance intensities ()

# ----------

# Specific information for R script:

HSICalLib_7c_dI_calculation.R

# Description of script:

Bring in _pga.RData (output from R script 7a), then calculate and output a final wide and long version of the delta (aka “change in”) intensities (dI) () spectra for the soil samples used in this study.

# Input files:

HSICalLib_wavevec.RData

HSICalLib_20230223_b1-b30_pga.RData

(output from R script 7a)

# Output files:

HSICalLib_20230223_b1-b30_pga_dI.RData

Data frame with the same dimensions as _pga.RData (output from R script 7a) except delta (aka “change in”) intensities (dI) values () are reported instead of obsI.

Number of rows/refI spectra = 99,537

Number of columns/variables = 486

Same as _p.RData (output from R script 4)

NOTE: These are 471 dI reflectance intensities NOT obsI ()

HSICalLib_20230223_b1-b30_pga_dI_melt.RData

“Long” version of _pga_dI.RData where wavelength is a variable

Number of rows/dI reflectance intensities = 46,881,927

99,537 spectra * 471 wavebands

Number of columns/variables = 5

Same as _p_gold_acor_melt.RData (output from R script 6a)

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

wavelength: wavelength corresponding to each obsI reflectance intensity

dI: change in (aka “delta”) intensities ()

# ----------

# Specific information for R script:

HSICalLib_8b_dI_predict_global_final.R

# Description of script:

Calibrate and evaluate a multiple linear regression model to predict dI using slope, aspect, and wavelength as predictor variables. Calculate error metrics (RMSE, NSE, and KGE) to quantify whether dI-corrected intensities (dIc) are closer to reference intensities (refI) than observed intensities (obsI).

# Input files:

HSICalLib_wavevec.RData

See Description of the data

HSICalLib_20230223_b1-b30_pga_melt.RData

(output from R script 7a)

HSICalLib_20230223_b1-b30_pga_refI_melt.RData

(output from R script 7b)

HSICalLib_20230223_b1-b30_pga_dI_melt.RData

(output from R script 7c)

# Output files:

HSICalLib_20230224_pga_dI_refI_melt.RData

Merged (long) form of _pga_melt.RData (output from R script 7a), _pga_refI_melt.RData (output from R script 7b), _pga_dI_melt.RData (output from R script 7c).

Number of rows/reflectance intensities = 46,881,927

99,537 spectra * 471 wavebands

Number of columns/variables = 7

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

wavelength: wavelength corresponding to each obsI reflectance intensity

obsI: observed reflectance intensities ()

refI: reference reflectance intensities ()

dI: change in (aka “delta”) intensities ()

HSICalLib_20230310_rutgerssamples_rand1.RData

A randomly chosen subset of 50 soil samples (out of the 450 samples collected from Duke Farms and imaged using HSI) were ultimately included in the topographic correction study due to these soil sample properties all being very similar while making up a large portion of the training data. This vector contains the HSI numbers for these 50 randomly chosen soil samples (all from the Rutgers archive).

Length/number of soil samples (HSInumbers) = 50

HSICalLib_20230613_pga_dI_refI_melt_slm.RData

Data frame with the same dimensions as Same as _pga_dI_refI_melt.RData (output from this R script) except it ONLY contains obsI, refI, and dI spectra collected at all non-zero slope orientations for the 681 soil samples used in this study.

Number of rows/reflectance intensities = 22,678,179

22,678,179 intensities / 471 wavebands = 48,149 spectra

Number of columns/variables = 7

Same as _pga_dI_refI_melt.RData (output from this R script)

HSICalLib_20230613_finalHSInums.RData

A vector containing the HSI numbers corresponding to the 681 soil samples used in this study.

Length/number of soil samples (HSInumbers) = 681

HSICalLib_20230613_globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData

Same as _pga_dI_refI_melt_slm.RData (output from this R script) except the columns dIp and dIc have been added. A multiple linear regression model was trained to predict dI using slope, aspect, wavelength, and their interactions as predictor variables. This model was evaluated to get predicted dI (dIp), then dIp was used to adjust (aka “correct”) obsI resulting in corrected dI (dIc). If the model was a perfect predictor, then dIp would equal dI AND dIc would equal refI.

Number of rows/reflectance intensities = 22,678,179

Number of columns/variables = 7

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

wavelength: wavelength corresponding to each obsI reflectance intensity

obsI: observed reflectance intensities ()

refI: reference reflectance intensities ()

dI: change in (aka “delta”) intensities ()

dIp: predicted dI intensities ()

dIc: dI-corrected intensities ()

NOTE: These dIc values are compared to obsI and refI to get summary stats that quantify how well the dI correction worked (i.e., how much closer dIc was to refI than obsI was to refI).

HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

Summary stats for dI corrected spectra at each configuration across all wavelengths. Calculate RMSE, NSE, and KGE for every soil sample at every configuration. In other words, compare obsI to refI AND dIc to refI using these RMSE, NSE, and KGE metrics.

Number of rows = 48,149

Number of spectra compared to their reference using this approach

Number of columns/variables = 15

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

RMSE_obs: Root mean squared error (obsI vs refI)

RMSE_dIc: Root mean squared error (dIc vs refI)

NSE_obs: Nash-Sutcliffe efficiency (obsI vs refI)

NSE_dIc: Nash-Sutcliffe efficiency (dIc vs refI)

KGE_obs: Kling-Gupta efficiency (obsI vs refI)

KGE_obs_r: Pearson correlation coefficient (obsI vs refI)

KGE_obs_beta: mean obsI / mean refI (obsI vs refI)

KGE_obs_alpha: standard deviation(obsI) / standard deviation(refI)

KGE_dIc: Kling-Gupta efficiency (dIc vs refI)

KGE_dIc_r: Pearson correlation coefficient (dIc vs refI)

KGE_dIc_beta: mean dIc / mean refI (dIc vs refI)

KGE_dIc_alpha: standard deviation(dIc) / standard deviation(refI)

HSICalLib_20230613_globaldI_spectralstats_dIc_w_s_2.RData

Summary stats for spectra at each wavelength & slope across all aspects. Same as _spectralstats_dIc_2.RData (output from this script) BUT spectra were grouped in a different way before calculating RMSE, NSE, and KGE.

Number of rows = 2,826

Number of intensities compared to their reference using this approach

Number of columns/variables = 14

slope: See Description of the data

wavelength: wavelength corresponding to each reflectance intensity

RMSE_obs: Root mean squared error (obsI vs refI)

RMSE_dIc: Root mean squared error (dIc vs refI)

NSE_obs: Nash-Sutcliffe efficiency (obsI vs refI)

NSE_dIc: Nash-Sutcliffe efficiency (dIc vs refI)

KGE_obs: Kling-Gupta efficiency (obsI vs refI)

KGE_obs_r: Pearson correlation coefficient (obsI vs refI)

KGE_obs_beta: mean obsI / mean refI (obsI vs refI)

KGE_obs_alpha: standard deviation(obsI) / standard deviation(refI)

KGE_dIc: Kling-Gupta efficiency (dIc vs refI)

KGE_dIc_r: Pearson correlation coefficient (dIc vs refI)

KGE_dIc_beta: mean dIc / mean refI (dIc vs refI)

KGE_dIc_alpha: standard deviation(dIc) / standard deviation(refI)

HSICalLib_20230613_globaldI_spectralstats_dIc_w_a_2.RData

Summary stats for spectra at each wavelength & aspect across all slopes. Same as before but aspect (rather than slope) & wavelength. Same as _spectralstats_dIc_2.RData (output from this script) BUT spectra were grouped in a different way before calculating RMSE, NSE, and KGE.

Number of rows = 6,123

Number of intensities compared to their reference using this approach

Number of columns/variables = 14

aspect: See Description of the data

wavelength: wavelength corresponding to each reflectance intensity

RMSE_obs: Root mean squared error (obsI vs refI)

RMSE_dIc: Root mean squared error (dIc vs refI)

NSE_obs: Nash-Sutcliffe efficiency (obsI vs refI)

NSE_dIc: Nash-Sutcliffe efficiency (dIc vs refI)

KGE_obs: Kling-Gupta efficiency (obsI vs refI)

KGE_obs_r: Pearson correlation coefficient (obsI vs refI)

KGE_obs_beta: mean obsI / mean refI (obsI vs refI)

KGE_obs_alpha: standard deviation(obsI) / standard deviation(refI)

KGE_dIc: Kling-Gupta efficiency (dIc vs refI)

KGE_dIc_r: Pearson correlation coefficient (dIc vs refI)

KGE_dIc_beta: mean dIc / mean refI (dIc vs refI)

KGE_dIc_alpha: standard deviation(dIc) / standard deviation(refI)

# ----------

# Specific information for R script:

HSICalLib_9a_cosine_correction_final.R

# Description of script:

Correct spectra using the theoretical “cosine correction”. Calculate error metrics (RMSE, NSE, and KGE) to quantify whether cosine corrected intensities (coscorI) are closer to reference intensities (refI) than observed intensities (obsI) or delta I corrected (dIc).

# Input files:

HSICalLib_wavevec.RData

See Description of the data

HSICalLib_20230223_b1-b30_pga.RData

(output from R script 7a)

HSICalLib_20230223_s0-s60_cosIL_all.RData

(output from R script 6b)

HSICalLib_20230310_rutgerssamples_rand1.RData

(output from R script 8b)

HSICalLib_20230223_b1-b30_pga_melt.RData

(output from R script 7a)

HSICalLib_20230223_b1-b30_pga_refI_melt.RData

(output from R script 7b)

# Output files:

HSICalLib_20230613_pga_slm_cosIL.RData

Select the rows in _pga.RData (output from R script 7a) corresponding to the 681 soil samples used in this study (see _finalHSInums.RData output from R script 8b), then merge this data frame with _cosIL_all.RData (output from R script 6b) resulting in a wide data frame with all reflectance spectra for the soil samples used in this study AND the constants needed for the cosine correction (calculated in R script 6b based on measurements of the HSI setup used in this study).

Number of rows/obsI spectra = 48,149

Number of columns = 500

slope: See Description of the data

aspect: See Description of the data

batch: Soil samples were imaged in groups of 40 at a time

well: Indexed location of the soil sample in the sample well array

HSInumber: unique soil sample identifier

HSIPackedDensity: mass of soil sample per volume of sample well

sandTotal: % sand (only available for samples from the NEON archive)

siltTotal: % silt (only available for samples from the NEON archive)

clayTotal: % clay (only available for samples from the NEON archive)

OC: % soil organic carbon (by weight)

archive: source of the soil sample and soil properties data

adod: air dried soil mass / oven dried soil mass

volC: % soil organic carbon (by volume)

log10volC: log10(volC)

471 observed (uncorrected) reflectance intensities (obsI*[wavelength]*)

batchwellID: unique reflectance spectra identifier

z1: zenith angle (degrees) between light bank 1 and the HSI camera,

NOTE: zenith varies with slope, light bank 1 = N = 0 azimuth

z2: zenith angle (degrees) between light bank 2 (S) and the HSI camera,

NOTE: zenith varies with slope, light bank 2 = S = 180 azimuth

meanz: average of z1 and z2

cosz1: cos(z1)

cosz2: cos(z2)

cosmeanz: cos(meanz)

meancosz: average of cos(z1) and cos(z2)

NOTE: This is the way we decided to combine z1 and z2.

cosIL1: cos( illumination angle (IL) ) light bank 1

= cos(z1)*cos(slope) + sin(z1)*sin(slope)*cos(azimuth-aspect)

cosIL2: cos( illumination angle (IL) ) light bank 2

= cos(z2)*cos(slope) + sin(z2)*sin(slope)*cos(azimuth-aspect)

meancosIL: average of cosIL1 and cosIL2

r1: cos(z1) / cosIL1

r2: cos(z2) / cosIL2

rmeans: cos(meanz) / meancosIL

rcosmeans: meancosz / meancosIL

NOTE: This is the ratio used in the final cosine correction.

HSICalLib_20230613_pga_slm_coscorI.RData

Same as _pga_slm_cosIL.RData (output from this script) except intensities reported are cosine corrected intensities (coscorI) rather than obsI.

Number of rows/cosine corrected (coscorI) spectra = 48,149

Number of columns/variables = 500

slope: See Description of the data

aspect: See Description of the data

batch: Soil samples were imaged in groups of 40 at a time

well: Indexed location of the soil sample in the sample well array

HSInumber: unique soil sample identifier

HSIPackedDensity: mass of soil sample per volume of sample well

sandTotal: % sand (only available for samples from the NEON archive)

siltTotal: % silt (only available for samples from the NEON archive)

clayTotal: % clay (only available for samples from the NEON archive)

OC: % soil organic carbon (by weight)

archive: source of the soil sample and soil properties data

adod: air dried soil mass / oven dried soil mass

volC: % soil organic carbon (by volume)

log10volC: log10(volC)

471 cosine corrected reflectance intensities (coscorI*[wavelength]*)

batchwellID: unique reflectance spectra identifier

z1: zenith angle (degrees) between light bank 1 and the HSI camera,

NOTE: zenith varies with slope, light bank 1 = N = 0 azimuth

z2: zenith angle (degrees) between light bank 2 (S) and the HSI camera,

NOTE: zenith varies with slope, light bank 2 = S = 180 azimuth

meanz: average of z1 and z2

cosz1: cos(z1)

cosz2: cos(z2)

cosmeanz: cos(meanz)

meancosz: average of cos(z1) and cos(z2)

NOTE: This is the way we decided to combine z1 and z2.

cosIL1: cos( illumination angle (IL) ) light bank 1

= cos(z1)*cos(slope) + sin(z1)*sin(slope)*cos(azimuth-aspect)

cosIL2: cos( illumination angle (IL) ) light bank 2

= cos(z2)*cos(slope) + sin(z2)*sin(slope)*cos(azimuth-aspect)

meancosIL: average of cosIL1 and cosIL2

r1: cos(z1) / cosIL1

r2: cos(z2) / cosIL2

rmeans: cos(meanz) / meancosIL

rcosmeans: meancosz / meancosIL

NOTE: This is the ratio used in the final cosine correction.

HSICalLib_20230613_pga_slm_coscorI_melt.RData

“Long” version of _pga_slm_coscorI.RData (output from this script) where wavelength is a variable.

Number of rows/cosine corrected intensities (coscorI) = 22,678,179

Number of columns/variables = 5

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

wavelength: wavelength corresponding to each reflectance intensity

coscorI: cosine corrected reflectance intensity

HSICalLib_20230613_pga_coscorI_refI_melt_slm.RData

Same as _pga_slm_coscorI_melt.RData (output from this script) except reference intensity (refI) and observed intensity (obsI) have been added as a columns by merging _pga_slm_coscorI_melt.RData (output from this script) with _pga_melt.RData (output from R script 7a) AND _pga_refI_melt.RData (output from R script 7b).

Number of rows/cosine corrected intensities (coscorI) = 22,678,179

Number of columns/variables = 5

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

wavelength: wavelength corresponding to each reflectance intensity

coscorI: cosine corrected reflectance intensity

obsI: observed reflectance intensities

refI: reference reflectance intensities

HSICalLib_20230613_spectralstats_coscor.RData

Summary stats for spectra at each configuration across all wavelengths.

Number of rows = 48,149

Number of spectra compared to their reference using this approach

Number of columns/variables = 15

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

RMSE_obs: Root mean squared error (obsI vs refI)

RMSE_coscor: Root mean squared error (coscor vs refI)

NSE_obs: Nash-Sutcliffe efficiency (obsI vs refI)

NSE_coscor: Nash-Sutcliffe efficiency (coscor vs refI)

KGE_obs: Kling-Gupta efficiency (obsI vs refI)

KGE_obs_r: Pearson correlation coefficient (obsI vs refI)

KGE_obs_beta: mean obsI / mean refI (obsI vs refI)

KGE_obs_alpha: standard deviation(obsI) / standard deviation(refI)

KGE_coscor: Kling-Gupta efficiency (coscor vs refI)

KGE_coscor_r: Pearson correlation coefficient (coscor vs refI)

KGE_coscor_beta: mean coscor / mean refI (coscor vs refI)

KGE_coscor_alpha: standard deviation(coscor) / standard deviation(refI)

# ----------

# Specific information for R script:

HSICalLib_10a_C_correction_final.R

# Description of script:

Correct spectra using the semi-empirical “C correction”. Calculate error metrics (RMSE, NSE, and KGE) to quantify whether C corrected intensities (ccorI) are closer to reference intensities (refI) than observed intensities (obsI).

# Input files:

HSICalLib_wavevec.RData

HSICalLib_20230223_s0-s60_cosIL_all.RData

(output from R script 6b)

HSICalLib_20230613_pga_slm_cosIL.RData

(output from R script 9a)

HSICalLib_20230310_rutgerssamples_rand1.RData

(output from R script 8b)

HSICalLib_20230223_b1-b30_pga_melt.RData

(output from R script 7a)

HSICalLib_20230223_b1-b30_pga_refI_melt.RData

(output from R script 7b)

# Output files:

HSICalLib_20230613_C-coefficient_ccdf.RData

Data frame containing the semi-empirically determined C coefficients for every waveband. These are calculated using obsI from _pga_melt.RData (output from R script 7a) and cosIL_all.RData (output from R script 6b).

Number of rows/wavelengths = 471

Number of columns/variables = 5

wavelength: wavelength corresponding to each reflectance intensity

slope: slope of the best fit line between obsI and cosIL

NOTE: Different than “slope angle” used everywhere else

intercept: intercept of the best fit line between obsI and cosIL

ccoef: C coefficient = intercept / slope

HSICalLib_20230613_pga_slm_cosIL_ccorI.RData

Data frame containing C corrected spectra along with selected soil properties, imaging orientation, theoretically calculated constants, and soil sample identifiers. Same as _pga_slm_cosIL.RData and _pga_slm_coscorI.RData (output from R script 9a) except intensities reported are C corrected intensities (ccorI) rather than obsI or coscorI.

Number of rows/C corrected (ccorI) spectra = 48,149

Number of columns/variables = 500

slope: See Description of the data

aspect: See Description of the data

batch: Soil samples were imaged in groups of 40 at a time

well: Indexed location of the soil sample in the sample well array

HSInumber: unique soil sample identifier

HSIPackedDensity: mass of soil sample per volume of sample well

sandTotal: % sand (only available for samples from the NEON archive)

siltTotal: % silt (only available for samples from the NEON archive)

clayTotal: % clay (only available for samples from the NEON archive)

OC: % soil organic carbon (by weight)

archive: source of the soil sample and soil properties data

adod: air dried soil mass / oven dried soil mass

volC: % soil organic carbon (by volume)

log10volC: log10(volC)

471 C corrected reflectance intensities (ccorI*[wavelength]*)

batchwellID: unique reflectance spectra identifier

z1: zenith angle (degrees) between light bank 1 and the HSI camera,

NOTE: zenith varies with slope, light bank 1 = N = 0 azimuth

z2: zenith angle (degrees) between light bank 2 (S) and the HSI camera,

NOTE: zenith varies with slope, light bank 2 = S = 180 azimuth

meanz: average of z1 and z2

cosz1: cos(z1)

cosz2: cos(z2)

cosmeanz: cos(meanz)

meancosz: average of cos(z1) and cos(z2)

NOTE: This is the way we decided to combine z1 and z2.

cosIL1: cos( illumination angle (IL) ) light bank 1

= cos(z1)*cos(slope) + sin(z1)*sin(slope)*cos(azimuth-aspect)

cosIL2: cos( illumination angle (IL) ) light bank 2

= cos(z2)*cos(slope) + sin(z2)*sin(slope)*cos(azimuth-aspect)

meancosIL: average of cosIL1 and cosIL2

r1: cos(z1) / cosIL1

r2: cos(z2) / cosIL2

rmeans: cos(meanz) / meancosIL

rcosmeans: meancosz / meancosIL

NOTE: This is the ratio used in the final cosine correction.

HSICalLib_20230613_pga_slm_ccorI_melt.RData

“Long” version of _pga_slm_ccorI.RData (output from this script) where wavelength is a variable.

Number of rows/C corrected intensities (ccorI) = 22,678,179

Number of columns/variables = 5

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

wavelength: wavelength corresponding to each reflectance intensity

ccorI: C corrected reflectance intensity

HSICalLib_20230613_pga_ccorI_refI_melt_slm.RData

Same as _pga_slm_ccorI_melt.RData (output from this script) except reference intensity (refI) and observed intensity (obsI) have been added as a columns by merging _pga_slm_ccorI_melt.RData (output from this script) with _pga_melt.RData (output from R script 7a) AND _pga_refI_melt.RData (output from R script 7b).

Number of rows/C corrected intensities (ccorI) = 22,678,179

Number of columns/variables = 5

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

wavelength: wavelength corresponding to each reflectance intensity

ccorI: C corrected reflectance intensity

obsI: observed reflectance intensities

refI: reference reflectance intensities

HSICalLib_20230613_spectralstats_ccor.RData

Summary stats for spectra at each configuration across all wavelengths.

Number of rows = 48,149

Number of spectra compared to their reference using this approach

Number of columns/variables = 15

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

RMSE_obs: Root mean squared error (obsI vs refI)

RMSE_ccor: Root mean squared error (ccor vs refI)

NSE_obs: Nash-Sutcliffe efficiency (obsI vs refI)

NSE_ccor: Nash-Sutcliffe efficiency (ccor vs refI)

KGE_obs: Kling-Gupta efficiency (obsI vs refI)

KGE_obs_r: Pearson correlation coefficient (obsI vs refI)

KGE_obs_beta: mean obsI / mean refI (obsI vs refI)

KGE_obs_alpha: standard deviation(obsI) / standard deviation(refI)

KGE_ccor: Kling-Gupta efficiency (ccor vs refI)

KGE_ccor_r: Pearson correlation coefficient (ccor vs refI)

KGE_ccor_beta: mean ccor / mean refI (ccor vs refI)

KGE_ccor_alpha: standard deviation(ccor) / standard deviation(refI)

# ----------

# Specific information for R script:

HSICalLib_12a_OC_predict_pls_final.R

# Description of script:

Train and evaluate a model to predict soil organic carbon (SOC) from reference spectra (refI), observed (non-zero slope, measured, uncorrected) spectra, and delta I corrected (dIc) to see whether dIC provide a better prediction of SOC than obsI.

# Input files:

HSICalLib_wavevec.RData

See Description of the data

HSICalLib_b1-b30_20230223_p_gold.RData (for soil properties)

Output from R script 5b

HSICalLib_20230613_globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData

Output from R script 8b

HSICalLib_20230310_rutgerssamples_rand1.RData

Output from R script 8b

# Output files:

HSICalLib_20230613_p_plots.RData

Selected soil properties data for the 681 soil samples used in this study

Number of rows/soil samples = 681

Number of columns/variables = 15

slope: See Description of the data

aspect: See Description of the data

batch: Soil samples were imaged in groups of 40 at a time

well: Indexed location of the soil sample in the sample well array

HSInumber: unique soil sample identifier

HSIPackedDensity: mass of soil sample per volume of sample well

sandTotal: % sand (only available for samples from the NEON archive)

siltTotal: % silt (only available for samples from the NEON archive)

clayTotal: % clay (only available for samples from the NEON archive)

OC: % soil organic carbon (by weight)

archive: source of the soil sample and soil properties data

adod: air dried soil mass / oven dried soil mass

volC: % soil organic carbon (by volume)

log10volC: log10(volC)

batchwellID: unique reflectance spectra identifier

HSICalLib_20230613_globaldI_OCpredict_dIc.RData

“Wide” version of _globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData (output from R script 8b) where dIc (delta I corrected) intensities are reported as spectra (471 variables) for each soil sample at each orientation.

Number of rows/dIc spectra = 48,149

Number of columns/variables = 474

471 wavebands = dIc*[wavelength]*

NOTE: These reflectance intensities are dIc (dI corrected)

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

HSICalLib_20230613_globaldI_OCpredict_refI.RData

“Wide” version of _globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData (output from R script 8b) where refI (reference) intensities are reported as spectra (471 variables) for each soil sample at each orientation.

Number of rows/refI spectra = 48,149

Number of columns/variables = 474

471 wavebands = refI*[wavelength]*

NOTE: These reflectance intensities are refI (reference)

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

HSICalLib_20230613_globaldI_OCpredict_obsI.RData

“Wide” version of _globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData (output from R script 8b) where obsI (observed, uncorrected) intensities are reported as spectra (471 variables) for each soil sample at each orientation.

Number of rows/obsI spectra = 48,149

Number of columns/variables = 474

471 wavebands = obsI*[wavelength]*

NOTE: These reflectance intensities are obsI (observed)

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

HSICalLib_20230613_globaldI_OCpredict_refI_p.RData

Merge _globaldI_OCpredict_refI.RData (output from this script) with the soil properties data in _p_gold.RData resulting in a data frame with refI (reference) intensities reported as spectra (471 variables) for each soil sample at each orientation.

Number of rows/refI spectra = 48,149

Number of columns/variables = 475

471 wavebands (predictor variables for PLSR) = refI*[wavelength]*

NOTE: These reflectance intensities are refI (reference)

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

log10volC: log base 10 of % SOC by volume (outcome variable for PLSR)

HSICalLib_20230613_globaldI_OCpredict_obsI_p.RData

Merge _globaldI_OCpredict_obsI.RData (output from this script) with the soil properties data in _p_gold.RData resulting in a data frame with obsI (observed, uncorrected) intensities reported as spectra (471 variables) for each soil sample at each orientation.

Number of rows/obsI spectra = 48,149

Number of columns/variables = 474

471 wavebands = obsI*[wavelength]*

NOTE: These reflectance intensities are obsI (observed)

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

log10volC: log base 10 of % SOC by volume (outcome variable for PLSR)

HSICalLib_20230613_globaldI_OCpredict_dIc_p.RData

Merge _globaldI_OCpredict_dIc.RData (output from this script) with the soil properties data in _p_gold.RData resulting in a data frame with dIc (delta I corrected) intensities reported as spectra (471 variables) for each soil sample at each orientation.

Number of rows/dIc spectra = 48,149

Number of columns/variables = 475

471 wavebands = dIc*[wavelength]*

NOTE: These reflectance intensities are dIc (dI corrected)

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

log10volC: log base 10 of % SOC by volume (outcome variable for PLSR)

HSICalLib_20230613_globaldI_plsmodel_log10volC_refI_train.RData"

Partial least squares regression model trained with _refI_p.RData (output from this script) to predict OC from 471 reflectance intensities.

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

Evaluate pls on refI training data, then calculate RMSE, NSE, and KGE for observed (laboratory measured) SOC vs SOC predicted by the PLSR model trained and evaluated on refI.

Number of rows/SOC predictions made from refI = 681

Number of columns/variables = 12

predicted: SOC predicted by this PLSR model

observed: laboratory measured SOC

slope: See Description of the data

aspect: See Description of the data

NOTE: slope and aspect are the same for all samples (rows) in this data frame because refI is the same regardless of orientation

HSInumber: unique soil sample identifier

RMSE: Root mean squared error

NSE: Nash-Sutcliffe efficiency (observed vs refI predicted SOC)

R2: Coefficient of determination (observed vs refI predicted SOC)

KGE: Kling-Gupta efficiency(observed vs refI predicted SOC)

KGE_r: Pearson correlation coefficient (observed vs refI predicted SOC)

KGE_beta: mean refI predicted / mean observed SOC

KGE_alpha: sd(refI predicted) / sd(observed) SOC

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData

Evaluate _globaldI_plsmodel_log10volC_refI_train.RData (output from this script) on _obsI_p.RData (output from this script) to predict OC from 471 obsI intensities. Then quantify the PLSR model performance using RMSE, NSE, and KGE (compare laboratory measured SOC to PLSR model predicted SOC).

Number of rows/SOC predictions made from obsI = 48,149

Number of columns/variables = 12

predicted: SOC predicted by this PLSR model

observed: laboratory measured SOC

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

RMSE: Root mean squared error

NSE: Nash-Sutcliffe efficiency (observed vs refI predicted SOC)

R2: Coefficient of determination (observed vs refI predicted SOC)

KGE: Kling-Gupta efficiency(observed vs refI predicted SOC)

KGE_r: Pearson correlation coefficient (observed vs refI predicted SOC)

KGE_beta: mean refI predicted / mean observed SOC

KGE_alpha: sd(refI predicted) / sd(observed) SOC

HSICalLib_20230613_globaldI_dIc_train.RData

This data frame contains the reflectance spectra which were randomly selected for each soil sample to train the NEXT PLSR model along with sample identifiers and log10volC (outcome variable).

Number of rows/dIc spectra = 681

Number of columns/variables = 475

471 wavebands = dIc*[wavelength]*

NOTE: These reflectance intensities are dIc (dI corrected)

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

log10volC: log base 10 of % SOC by volume (outcome variable for PLSR)

HSICalLib_20230613_globaldI_plsmodel_log10volC_dIc_train.RData

PLSR model trained to predict SOC using 1 delta I corrected reflectance spectra per soil sample from a randomly chosen orientation (see _globaldI_dIc_train.RData, output from this script).

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData

Evaluate PLSR trained using dIc spectra from 1 randomly chosen orientation per sample on its training data, then quantify performance using RMSE, NSE, and KGE.

Number of rows/SOC predictions made from dIc = 681

Number of columns/variables = 12

predicted: SOC predicted by this PLSR model

observed: laboratory measured SOC

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

RMSE: Root mean squared error (observed vs dIc predicted SOC)

NSE: Nash-Sutcliffe efficiency (observed vs dIc predicted SOC)

R2: Coefficient of determination (observed vs dIc predicted SOC)

KGE: Kling-Gupta efficiency(observed vs dIc predicted SOC)

KGE_r: Pearson correlation coefficient (observed vs dIc predicted SOC)

KGE_beta: mean dIc predicted / mean observed SOC

KGE_alpha: sd(dIc predicted) / sd(observed) SOC

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_dIc.RData

Evaluate _globaldI_plsmodel_log10volC_dIc_train.RData (output from this script) on _dIc_p.RData (output from this script) to predict OC from 471 dIc intensities (using dIc spectra from all soil samples at all orientations). Then quantify the PLSR model performance using RMSE, NSE, and KGE (compare laboratory measured SOC to PLSR model predicted SOC).

Number of rows/SOC predictions made from dIc = 48,149

Number of columns/variables = 12

predicted: SOC predicted by this PLSR model

observed: laboratory measured SOC

slope: See Description of the data

aspect: See Description of the data

HSInumber: unique soil sample identifier

RMSE: Root mean squared error (observed vs dIc predicted SOC)

NSE: Nash-Sutcliffe efficiency (observed vs dIc predicted SOC)

R2: Coefficient of determination (observed vs dIc predicted SOC)

KGE: Kling-Gupta efficiency(observed vs dIc predicted SOC)

KGE_r: Pearson correlation coefficient (observed vs dIc predicted SOC)

KGE_beta: mean dIc predicted / mean observed SOC

KGE_alpha: sd(dIc predicted) / sd(observed) SOC

HSICalLib_202300613_globaldI_PLSR_log10volC_summarystats_OC.RData

Summary stats for spectra at each configuration across all wavelengths

# ----------

# Specific information for R script:

HSICalLib_0_FinalPlots.R

# Description of script:

Create all the final plots for the paper.

# Input files:

HSICalLib_wavevec.RData

See Description of the data

HSICalLib_20230613_globaldI_predict_lm_pga_dI_refI_melt_slm_dIp_dIc_2.RData

Output from R script 8b

HSICalLib_20230613_globaldI_spectralstats_dIc_1.RData

Output from R script 8b

HSICalLib_20230613_globaldI_spectralstats_dIc_2.RData

Output from R script 8b

HSICalLib_20230613_spectralstats_coscor.RData

Output from R script 9a

HSICalLib_20230613_spectralstats_ccor.RData

Output from R script 10a

HSICalLib_20230613_globaldI_spectralstats_dIc_w_s_2.RData

Output from R script 8b

HSICalLib_20230613_globaldI_spectralstats_dIc_w_a_2.RData

Output from R script 8b

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_ref.RData

Output from R script 12a

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_obs.RData

Output from R script 12a

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_tr_dIc.RData

Output from R script 12a

HSICalLib_20230613_globaldI_PLSR_log10volC_sstatdf_dIc.RData

Output from R script 12a

HSICalLib_20230613_globaldI_PLSR_log10volC_summarystats_OC.RData

Output from R script 12a

# Output files:

HSICalLib_20231014_spectralstats_boxplots_final.pdf

Figure 6

RMSE and NSE vs slope and aspect (box plots)

NOTE: Different than Figure 10 because these are for dIc predictions

HSICalLib_20231014_KGE_slope_boxplots_final.pdf

KGE, alpha, and beta vs slope (box plots)

HSICalLib_20231014_KGE_aspect_boxplots_final.pdf

KGE, alpha, and beta vs aspect (box plots)

HSICalLib_20231014_obsI_dIc_refI_spectra_final.pdf

Figure 8

RI vs wavelength (refI, obsI, and dIc), dIc colored by slope and aspect

HSICalLib_20231014_obsI_spectra_final.pdf

Figure 5

RI vs wavelength (refI and obsI), obsI colored by slope and aspect

HSICalLib_20231014_dIc_RMSE_spectra_final.pdf

Figure 7

RMSE vs wavelength (obsI and dIc), dIc colored by slope and aspect

HSICalLib_20230622_OCvalidationplots_final.pdf"

Figure 9

Observed vs predicted SOC (colored by slope)

HSICalLib_20231014_globaldI_summarystats_OC_boxplots_final.pdf

Figure 10

RMSE and NSE vs slope and aspect (box plots)

NOTE: Different than Figure 6 because these are for SOC predictions

## Methods

Hyperspectral imaging (HSI) was performed with a high-sensitivity sCMOS VNIR hyperspectral camera (MSV 500, Middleton Spectral Vision, Middleton, WI). A custom-designed, 3-D printed sample array was used to present homogenized soil samples packed into sample wells to a laboratory-based HSI reflectance spectrometer at 91 configurations of slope and aspect. Pixels representing each soil sample's reflectance spectra were isolated from hyperspectral images and averaged to obtain a single reflectance spectrum for each slope and aspect configuration per soil sample.

## Usage notes

Raw data were collected with FastFrame data acquisition software (Middleton Spectral Vision, Middleton, WI). Data processing and analysis were performed in R using the R Scripts found on GitHub at github.com/aduro005/HSITopographicCorrectionRScripts.

## Funding

National Science Foundation, Award: 2034232 (PLS)

National Institute of Food and Agriculture, Award: DRH-no. 2021-67019-34341

National Science Foundation, Award: 2034214 (LL)

National Institute of Food and Agriculture, Award: SAB-no. 2021-67019-34338

National Institute of Food and Agriculture, Award: AF-no. 2021-67019-343340

National Institute of Food and Agriculture, Award: CA-R-ENS-5195-H, Project accession no. 1022418

National Institute of Food and Agriculture, Award: CA-R-ENS-5147-H

National Science Foundation, Award: DBI-1624205

Battelle, Award: US001-0000757206