Data for: Characterization of large-scale preferential flow across continental United States
Data files
Jan 23, 2024 version files 2.88 MB
-
output_file_1.xlsx
1.19 MB
-
output_file_2.xlsx
1.69 MB
-
README.md
4.84 KB
Abstract
Understanding preferential flow (PF) at large scales is critical for improving land management and groundwater (GW) quality. However, limited knowledge of this process, due to soil surface heterogeneity and observational constraints, hampers progress. In this study, we propose estimating effective PF at remote sensing footprint scale (4 – 9 km) by examining its impact on soil moisture (SM) distribution and shallow GW (SGW) table fluctuations (depth 5 m). Effective PF encompasses macropore, funnel, and finger flow pathways influencing SGW table fluctuations. We compiled daily SGW observations (2019-2021) from 19 continental US (CONUS) sites through USGS. Using inverse modeling in HYDRUS-1D, SGW data, and CHIRPS precipitation data, we inversely estimated soil hydraulic parameters of the dual porosity model (DPM) simulating vertical flow from soil surface to subsurface. Effective PF presence was inferred using three criteria: (1) daily precipitation
>= the site-specific average across multiple (calibration) years, (2) daily observed SGW table increase, and (3) daily difference between observed and DPM simulated SGW tables
50% of the site-specific RMSE. Leveraging optimized DPM parameters and associated soil texture, classified PF events, and Soil Moisture Active Passive (SMAP L3E) satellite-based SM, a Random Forest algorithm with 10-fold cross validation predicted large-scale effective PF events. Results indicate seasonal dependence, with spring having the highest occurrence of PF events. The Random Forest model achieved 98% accuracy in predicting large-scale PF events, with SMAP SM and saturated hydraulic conductivity (Ks) among the 4 most impactful variables. Our approach provides a soil hydraulic property, site characteristic, soil texture and remote sensing based generalized tool to analyze large-scale effective PF.
https://doi.org/10.5061/dryad.8kprr4xv3
output_file_1.xlsx:
- Source: Assembled and calculated by authors
- Contents: date, site, season, year, SMAPsm, residual water content (theta.r), saturated water content (theta.s), alpha, n, saturated hydraulic conductivity (Ks), tortuosity parameter (I), residual water content in immobile region (theta.r.im), saturated water content in immobile region (theta.s.im), exchange term (omega), soil texture, bulk density (BD), clay %, sand %, silt %, clay percent range, slope, elevation, observed groundwater, fitted groundwater (HYDRUS output), and precipitation
- Respective Units: (Y-m-d), text, text, number, m3/m3, cm3/cm3, cm3/cm3, 1/cm, -, cm/d, -, cm3,cm3, cm3,cm3, 1/d, text, cg/cm^3, %, %, %, text, %, ft, cm, cm, cm.
output_file_2.xlsx:
- Source: Assembled and calculated by authors
- Contents: date, site, season, year, SMAPsm (SMAP soil moisture), residual water content (theta.r), saturated water content (theta.s), alpha, n, saturated hydraulic conductivity (Ks), toruosity parameter (I), residual water content in immobile region (theta.r.im), saturated water content in immobile region (theta.s.im), exchange term (omega), soil texture, bulk density (BD), clay %, sand %, silt %, clay percent range, slope, elevation, observed groundwater, fitted groundwater (HYDRUS output), precipitation, groundwater difference, SMAP soil moisture difference, groundwater table movement direction classification, precipitation intensity classification, error (difference between observed and fitted groundwater table data. This value is compared to the RMSE).
- Respective Units: (Y-m-d), text, text, number, m3/m3, cm3/cm3, cm3/cm3, 1/cm, -, cm/d, -, cm3,cm3, cm3,cm3, 1/d, text, cg/cm3, %, %, %, text, %, ft, cm, cm, cm, cm, m3/m^3, text, text, cm.
Description of the data and file structure
The excel files (output_file_1.xlsx and output_file_2.xlsx) are the key outputs from the publication. Output_file_2 builds from output_file_1. Output_file_1 and Output_file_2 both contain the optimized soil hydraulic parameters for each site (based on soil texture) which are the result of running HYDRUS-1D. HYDRUS-1D allowed us to simulate the precipitation and groundwater recharge relationship via preferential flow pathways (or rapid flowing fluid from soil surface to subsurface). Through inverse modeling in HYDRUS-1D (providing the precipitation and observed groundwater table data from USGS prior to the simulation), we were able to determine the optimized soil hydraulic parameters driving preferential flow based on van Genuchten model for that site and soil texture. More details can be found in the publication regarding the simulations. The optimized soil hydraulic parameters that we found in this work can be used in future work. Our study simulated sites across Contineal United States covering a variety of climate and environmental scenarios.
Output_file_2 differs from output_file_1 in that the daily large scale effective preferential flow classifier is assigned in Output_file_2. Using the methodology in the publication, we have assigned each day with either experiencing or not experiencing large scale effective preferential flow. This is the first dataset containing this kind of information (that the authors know of). It is our hope that other studies use this dataset to continue building on the ability to detect large scale effective preferential flow events.
Codes have been provided for the novel aspects of the study including the large scale effective preferential flow assignment based on the implementation of our 3 criteria, and the implementation of a Random Forest model to estimate the presence of large scale effective preferential flow across Continental United States.
Sharing/Access information
Code/Software
HYDRUS-1D
HYDRUS-1D software was used to inversely model the observed groundwater table levels from USGS in order to determine the optimized soil hydraulic parameters. The software was also use to forward model the observed groundwater tables from USGS at new sites with the optimized soil hydraulic parameters (selected for each site based on soil texture) to check the performance of the parameters based on the estimated groundwater table values (HYDRUS-1D forward model results) compared to observed groundwater table values from USGS.
R
R was used to import data, manage data, conduct data analysis, determine preferential events on a daily basis using three selective criteria described in the publication, and to run a Random Forest model on the data to estimate large-scale effective preferential flow events.
Codes can be found on GitHub: https://github.com/leahkay4/PF-Paper
A total of 19 sites with diverse soil-vegetation-climate characteristics across CONUS were selected for the study (Fig. 2). Daily mean GW data at these sites were gathered from observation from the USGS National Water Information System (NWIS). Following similar approaches from Babajimopoulos et al., 2007 and Costa et al., 2023, sites with GW tables less than or equal to 5 meters were defined as shallow and GW tables deeper than 5 meters were defined as deep. Out of the 19 sites, 3 sites (Laurens, Georgia; Tioga, New York; Saline, Missouri) had deep GW table data, 12 sites (Blaine, Nebraska; Jones, North Carolina; St. Lawrence, New York; Chatham, Georgia; Red Lake, Minnesota; Walker, Texas; Kalamazoo, Michigan; Monmouth, New Jersey; St. Croix, Wisconsin; Big Horn, Montana; Pasquotank, North Carolina; Tazewell, Illinois) had SGW table data from 2019-2021, and 4 sites (Hooker, Nebraska; Garden, Nebraska; Duplin, North Carolina; Clinton, Illinois) had SGW data from 2021-2022. Precipitation data at a daily scale and 4800 m spatial resolution were collected from the Climate Hazards Group Infrared Precipitation with Station data (CHIRPS), accessed through Climate Engine. Van Genuchten soil hydraulic parameters from the soil catalog in HYDRUS-1D for various textural classes (Carsel & Parrish, 1988) were used as initial parameters in the inverse modeling of the dual porosity model (DPM) (Gerke & van Genuchten, 1993). Site-specific surface soil texture, slope, and elevation were collected from the Soil Survey Geographic Database (SSURGO) at a 250 m spatial scale for the selected study soils. In this study, we assumed vertical soil texture homogeneity, while addressing soil surface lateral heterogeneity by selecting sites across CONUS with diverse surface soil textures. NASA’s SMAP L3E product provided SM (z < 5 cm) at 2–3-day intervals with a 9-km resolution across CONUS. SM data from SMAP was accessed through the Earth Data Application for Extracting and Exploring Analysis Ready Samples (AppEEARS) database. Soil physical properties such as bulk density from the World Soil Information Service (WoSIS) database were gathered from SoilGrids with 250 m grid size. The 12 sites with SGW table data from 2019-2021 in conjunction with CHIRPS precipitation data as input were used to inversely model the soil hydraulic parameters as output using the DPM. The 4 remaining independent sites with SGW table data from 2021-2022 were used to validate the inversely modeled soil hydraulic parameters by forward modeling using DPM.