Statistical identification of nitrous oxide hot moments and their significance across global ecosystems
Data files
Oct 16, 2025 version files 11.47 MB
-
Hot_Moment_Identification_Code-_Actuals.ipynb
5.62 MB
-
HotMomentTreatments.csv
5.31 MB
-
README.md
5.99 KB
-
Supplemental_Material.pdf
534.63 KB
Abstract
Nitrous oxide (N2O) emissions from agricultural soils contribute 4% of total anthropogenic greenhouse gas (GHG) emissions globally. Events known as ‘hot moments’ can occur following environmental changes that favor N2O production, which contribute disproportionately to annual cumulative emissions. Despite their significance, hot moments and their impact have not been statistically well defined, particularly on a global scale. We collected 13,787 soil N2O flux measurements from 42 publications and evaluated 14 methods of statistical anomaly detection for their ability to identify hot moments within datasets. Two methods achieved highest overall performance by Matthews correlation coefficient (MCC): median absolute deviation (MCC: 0.80) and minimum covariance determinant (MCC: 0.80), the latter which also performed evenly across highly dissimilar datasets and identified more contextually important minor hot moments (39%) that other methodologies may misidentify. Interquartile range, which has previously been used and recommended, performed poorly when hot moments were either very rare or very common within a dataset, and identified few local hot moments (14%). Overall, hot moments comprised 19% of measurements while contributing 75% of cumulative emissions. The median background N2O emission reported in all datasets was 2.2 g N ha -1 day -1, while the median hot moment emission was 10-fold higher, ranging from 23 to 25 g N h -1 day -1. These findings advance knowledge of how to accurately define and identify hot moments globally - a crucial task to investigating and mitigating these critical biogeochemical events.
This work uses several methods of statistical outlier detection for the detection of hot moments of nitrous oxide emissions using a dataset of daily average emissions collected from publications across the globe. Three files are included: first is a CSV file containing all data collected from publications (HotMomentTreatments.csv). Second, “Supplemental_Material.pdf” contains further description of statistical concepts and the final optimized model parameters used. The third file “Hot_Moment_Identification_Code-_Actuals.ipynb” is a Jupyter notebook containing all code used to perform data analysis and figures.
Sharing/Access information
The source of each data point is cited within HotMomentTreatments.csv.
Code/Software
All code for data analysis is contained in the file “Hot_Moment_Identification_Code-_Actuals.ipynb”, which is a Jupyter notebook file.
Analysis was performed using Python 3.8, Pyod 1.0.9, Fitter 1.5.2, Pandas 1.4.3, Numpy 1.21.2, Scipy 1.8.1, Matplotlib 3.5.2, Plotly 5.13.1, and Scikit-learn 1.1.1. All code was run within a Jupyter notebook using Jupyter 1.0.0.
Description of the data and file structure
HotMomentTreatments.csv
This notebook requires HotMomentTreatments.csv to be saved in the same directory as the notebook. Running this notebook will generate several additional CSV files which serve for data inspection, verification, and as checkpoints so that all analysis need not be run at once. These additional CSV files should also be maintained in the same directory as the notebook.
| Attribute | Description | Units |
|---|---|---|
| ExperimentID | Identifier for each unique published experiment | NA |
| TreatmentID | Identifier for each unique experimental treatment | NA |
| RawID | Identifier for each unique experimental measurement | NA |
| PubID | Identifier for each unique publication | NA |
| HMTruth | Expert classifications of each N2O flux value as either hot moment or background emission, where 1 represents hot moment and 0 represents background emission. | NA |
| Date | Date | NA |
| DOY | Day of the year | NA |
| N2OFlux | Daily average nitrous oxide flux, measured treatment average from multiple replications. | Grams nitrogen per hectare per day |
| FluxStandardError | Standard error of treatment average flux value | Grams nitrogen per hectare per day |
| NitrogenApplied | Nitrogen fertilizer applied to the field. | Kilograms per hectare |
| SandMean | Soil sand content | Percent |
| SiltMean | Soil silt content | Percent |
| ClayMean | Soil clay content | Percent |
| PrimaryCrop | Crop grown within the primary growing season or throughout the majority of the experimental period. | NA |
| Latitude | Latitude of the research site. | Decimal degrees |
| Longitude | Longitude of the research site. | Decimal degrees |
| PubTitle | Title of publication | |
| Citation | Citation of the publication from which the data was collected. | NA |
