Data from: Coupled machine learning-ecosystem ensemble models substantially improve predictions of nitrous oxide (N2O) fluxes from US croplands
Data files
Feb 16, 2026 version files 5.24 MB
-
Dataset_S1.xlsx
27.07 KB
-
Dataset_S2.csv
4.05 MB
-
Dataset_S3.csv
1.16 MB
-
README.md
4.59 KB
Abstract
Nitrous oxide (N₂O) is a potent and persistent greenhouse gas, with rising atmospheric concentrations driven in part by inefficient use of synthetic nitrogen (N) fertilizers in agriculture. Predicting soil N₂O emissions is challenging due to high spatial and temporal variability arising from complex soil biogeochemical processes. Process-based ecosystem models and standalone machine learning (ML) approaches without extensive site-specific calibration often miss high emission episodes. Here, we show how an Ensemble Modeling System (EMS) based on outputs from an ensemble of ecosystem models coupled to an ensemble of ML models can improve predictions and understanding of N2O fluxes from US cropland. Trained and validated on approximately 12,000 N2O chamber measurements at 17 U.S. Midwest sites (six crops, 35 management practices), the EMS accurately predicted daily fluxes of N2O at both training (R² = 0.84, RMSE = 16.4 g N ha⁻¹ d⁻¹) and held-out testing sites (R² = 0.84, RMSE = 6.2 g N ha⁻¹ d⁻¹). Analyses identified six dominant N₂O drivers: soil organic carbon (SOC), NH₄⁺, NO₃⁻, water-filled pore space (WFPS), soil temperature, and biomass production. Wet, warm soils produced large N₂O peaks only with sufficient SOC and mineral N; in low-SOC soils, fluxes remained low. Incorporating these drivers into process-based models might significantly improve their predictive capacity. The EMS demonstrates a strong potential to predict N₂O fluxes at unseen sites, enabling more reliable regional inventories, improved gap-filling where measurements are sparse, and enhanced understanding of mechanisms to advance targeted mitigation strategies in food, feed, and bioenergy crops.
Dataset DOI: 10.5061/dryad.pvmcvdnzx
Description of the data and file structure
We present here the data that were used for the analysis presented in: Coupled machine learning-ecosystem ensemble models substantially improve predictions of nitrous oxide (N2O) fluxes from US croplands.
Files and variables
Files: Dataset_S1.xlsx, Dataset_S2.csv, Dataset_S3.csv,
Description:
Description of data sheets
Dataset S1A columns
- Site_ID: Numeric identifier for the experimental site.
- Treatment_ID: Numeric code for the management treatment applied at that site
- DataUse: To assign data to be used for model training (development) and testing (held-out evaluation)
- State/Province: State acronym
- Latitude decimal deg: Site location latitude
- Longitude decimal deg: Site location longitude
- Start Data Year: Starting year of data used
- End Data Year: Ending year of data used
- Cover crop: Type of cover crop used within the treatment
- Rotation Descriptor: Describe the rotation of crops within the treatment
- Tillage Descriptor: Describe tillage type within the treatment
- Residual Removal: Describe residual management within the treatment
- Irrigation: Describe if irrigation was applied or not within the treatment
- N Treatment Descriptor: Describe nitrogen amendments within the treatment
- Reference: Reference for the data
Dataset S1B: This sheet contains the reference list for the data used
Dataset S2 columns
- Date: Gas sampling days
- Site_ID: Numeric identifier for the experimental site.
- Treatment_ID: Numeric code for the management treatment applied at that site
- DataUse: To assign data to be used for model training (development) and testing (held-out evaluation)
- Observed N2O: Daily average N2O flux measured (g N2O-N ha-1d-1)
- Predicted N2O: Daily average N2O flux predicted by multimodel hybrid framework (g N2O-N ha-1d-1)
- NH4: Process-based models simulated daily NH4-N content in the top 30-cm soil layer (kg ha-1)
- SOC: Process-based models simulated daily soil organic carbon in the top 30-cm soil layer (kg ha-1)
- NO3: Process-based models simulated daily NO3-N content in the top 30-cm soil layer (kg ha-1)
- ST: Process-based models simulated daily average soil temperature in the top 30-cm soil layer (°C)
- WFPS: Process-based models simulated daily water-filled pore space in the top 30-cm soil layer (fraction)
- ABG: Process-based models simulated daily above-ground biomass (kg ha-1)
- BG: Process-based models simulated daily below-ground biomass (kg ha-1)
- SRAD: Average solar radiation for the last five days before gas sampling (Watt m-2)
- Tmax: Average maximum air temperature for the last three days before gas sampling (°C)
- APrecip: Average precipitation in the last fifteen days before gas sampling (mm)
- Wspd: Average wind in the last fifteen days before gas sampling (m s-1)
- LAI: Process-based models simulated daily leaf area index (m2 m2)
- Nstress: Process-based models simulated the daily nitrogen stress factor (fraction)
- Wstress: Process-based models simulated the daily water stress factor (fraction)
- PET: Process-based models simulated daily potential evapotranspiration (mm)
- SE: Process-based models simulated daily soil evaporation (mm)
- SPrecip: Cumulative precipitation in the last two days before gas sampling (mm)
- SH: Average specific humidity in the last three days before gas sampling (g kg-1)
- RH: Average relative humidity in the last fifteen days before gas sampling (%)
Dataset S3 columns
- Date: Gas sampling days
- Site_ID: Numeric identifier for the experimental site.
- Treatment_ID: Numeric code for the management treatment applied at that site
- DataUse: To assign data to be used for model training (development) and testing (held-out evaluation)
- SD: Monte Carlo standard deviation of the simulated daily N₂O flux distribution (g N2O-N ha-1d-1)
- CV: Monte Carlo coefficient of variation of the simulated daily N₂O flux distribution (%)
- CI05: 5th percentile (lower 90 % confidence bound) of the Monte Carlo flux distribution(g N2O-N ha-1d-1)
- CI95: 95th percentile (upper 90 % confidence bound) of the Monte Carlo flux distribution(g N2O-N ha-1d-1)
