Quantifying methane emissions from Laurentian Great Lakes estuaries using in situ measurements, remote sensing and machine learning
Data files
Dec 18, 2025 version files 8.47 MB
-
ch4_rs_MLprep.zip
2.36 MB
-
Metadata-LO-Letters-data2.doc
439.81 KB
-
README.md
12.26 KB
-
sentinel3_ch4_model.zip
5.66 MB
Abstract
This dataset contains the data used to develop the model and figures for the manuscript in L&O Letters. In this study, CH4 fluxes were measured from three drowned river mouths (DRM) estuaries along the eastern shore of Lake Michigan using low-cost, autonomous floating samplers in the littoral zone and discrete samples in the pelagic zone from May – October 2024. Sentinel-3 OLCI, gridMET, and MODIS were used to calculate environmental variable proxies, which were used to estimate CH4 fluxes with machine learning.
Description of the data and file structure
The ch4_rs_MLprep.zip file contains the R Project repository containing data and Rmd files to analyze the data going into the ML model in sentinel3_ch4_model.zip as well as Rmd files corresponding to figures in the L&O Letters manuscript. The Metadata-LO-Letters-data2.doc file contains detailed metadata for each data file.
The ch4_rs_MLprep.zip contains the following data files for data preparation for ML and figure creation:
- 2_PSO_RMS_mac_modis_cc_1d.out / 2_PSO_RMS_mkg_modis_cc_1d.out / 2_PSO_RMS_wht_modis_cc_1d.out – output from the air2water algorithm simulating surface water temperature from known air temperatures
- year – year (NA)
- month – month (NA)
- day – day (NA)
- observed air temp – observed air temperature from airport stations (°C)
- observed water temp – observed water temperature from MODIS-Aqua (°C)
- simulated water temp – simulated water temperature from the air2water model (°C)
- observed water temp aggregated – observed water temperature from MODIS-Aqua at defined timescale (daily) (°C)
- simulated water temp aggregated – simulated water temperature at defined timescale (daily) (°C)
- CH4_flux_ML_1d.csv – in situ data to be joined with remotely sensed data for machine learning
- Date – Date formatted as m/d/yyyy
- Site – sampling location for discrete samples (Lake – Location) or chambers (1, 2, 5, or 6)
- Lat – EPSG 4326 latitude (decimal degrees)
- Long – EPSG 4326 longitude (decimal degrees)
- Temp – observed water temperature from YSI-EXO sonde (discrete) or SHT-85 sensor (chambers) (°C)
- rh – relative humidity from SHT85 sensor (NA)
- CH4_flux – CH4 flux from discrete samples or daily averaged chambers (µmol/m²/hr)
- Chla.x – in situ chlorophyll-a (µg/L)
- pH – in situ pH (NA)
- chamber_rs_with_ETR_predictions_logV2.csv – machine learning dataset containing extra trees regression model predictions of CH4 flux at deployed chamber sites
- Date – Date as m/d/yyyy
- Latitude – EPSG 4326 latitude (decimal degrees)
- Longitude – EPSG 4326 longitude (decimal degrees)
- CI – Sentinel-3 OLCI cyanobacteria index (NA)
- Chla – Sentinel-3 OLCI chlorophyll-a (log(mg/L))
- PAR – Sentinel-3 OLCI PAR (µEinstein m⁻²)
- Kd490 – Sentinel-3 OLCI diffuse attenuation coefficient (m⁻¹)
- rtoa1–rtoa12 – Sentinel-3 OLCI top of atmosphere radiance bands 1–12 (W m⁻² sr⁻¹ nm⁻¹)
- ADG443 – Sentinel-3 OLCI colored dissolved organic material (m⁻¹)
- TSM – Sentinel-3 OLCI total suspended matter (g m⁻³)
- pr – gridMET daily precipitation (mm)
- pr3 – gridMET cumulative 3-day precipitation (mm)
- pr5 – gridMET cumulative 5-day precipitation (mm)
- pr7 – gridMET cumulative 7-day precipitation (mm)
- tmmx – gridMET daily maximum air temperature (°K)
- vs – gridMET daily wind speed (m/s)
- Lake – sampled lake (NA)
- water_temp_K – MODIS or air2water lake surface temperature (°K)
- sim_water – simulated lake surface temperature output from air2water (°K)
- Site – sampling location for discrete samples (Lake – Location) or chambers (1, 2, 5, or 6)
- Chla.y – Sentinel-3 OLCI chlorophyll-a (log(mg/L))
- NDCI – Sentinel-3 normalized difference chlorophyll index (NA)
- MCI – Sentinel-3 maximum chlorophyll index (NA)
- Predicted – extra trees regression predicted CH4 flux (log(µmol/m²/hr))
- df_NDCI_MCI.csv – machine learning dataset containing in situ CH4 and corresponding satellite image variables including NDCI and MCI
- Date – Date as m/d/yyyy
- Site – sampling location for discrete samples (Lake – Location) or chambers (1, 2, 5, or 6)
- Latitude – EPSG 4326 latitude (decimal degrees)
- Longitude – EPSG 4326 longitude (decimal degrees)
- CH4_flux – CH4 flux from discrete samples or daily averaged chambers (µmol/m²/hr)
- CI – Sentinel-3 OLCI cyanobacteria index (NA)
- Chla.y – Sentinel-3 OLCI chlorophyll-a (log(mg/L))
- PAR – Sentinel-3 OLCI PAR (µEinstein m⁻²)
- Kd490 – Sentinel-3 OLCI diffuse attenuation coefficient (m⁻¹)
- rtoa1–rtoa12 – Sentinel-3 OLCI top of atmosphere radiance bands 1–12 (W m⁻² sr⁻¹ nm⁻¹)
- ADG443 – Sentinel-3 OLCI colored dissolved organic material (m⁻¹)
- TSM – Sentinel-3 OLCI total suspended matter (g m⁻³)
- pr – gridMET daily precipitation (mm)
- pr3 – gridMET cumulative 3-day precipitation (mm)
- pr5 – gridMET cumulative 5-day precipitation (mm)
- pr7 – gridMET cumulative 7-day precipitation (mm)
- tmmx – gridMET daily maximum air temperature (°K)
- vs – gridMET daily wind speed (m/s)
- Lake – sampled lake (NA)
- water_temp_K – MODIS or air2water lake surface temperature (°K)
- sim_water – simulated lake surface temperature output from air2water (°K)
- NDCI – Sentinel-3 normalized difference chlorophyll index (NA)
- MCI – Sentinel-3 maximum chlorophyll index (NA)
- fold_predictions2.csv – extra trees model performance (training and testing) between K-folds and repeats
- Fold – number ID of which fold the cross-validation is on (NA)
- Set – train or test (NA)
- TRUE – measured in situ CH4 flux (log(µmol/m²/hr))
- Predicted – ETR predicted CH4 flux (log(µmol/m²/hr))
- gridMET_ch_sites_buffered.csv – average gridMET climate variables at buffered in situ CH₄ sites
- Lat – EPSG 4326 latitude (decimal degrees)
- Long – EPSG 4326 longitude (decimal degrees)
- system:index – image ID containing the date as YYYYMMDD (NA)
- pr – daily cumulative precipitation (mm)
- rmax – maximum daily relative humidity (NA)
- rmin – minimum daily relative humidity (NA)
- tmmn – minimum daily air temperature (Kelvin)
- tmmx – maximum daily air temperature (Kelvin)
- vs – wind speed (m/s)
- L1_OLCI_extraction_EV.csv – Sentinel-3 Level-1 reflectances at measured CH₄ sites
- Date – date as m/d/yyyy (NA)
- Time – acquisition time in seconds since midnight UTC (seconds)
- Latitude – EPSG 4326 latitude (decimal degrees)
- Longitude – EPSG 4326 longitude (decimal degrees)
- rtoa1–rtoa21 – Sentinel-3 OLCI top-of-atmosphere radiance bands 1–21 (W m⁻² sr⁻¹ nm⁻¹)
- MODIS_AQUA_data.csv – MODIS AQUA land surface temperature at CH₄ sites
- LST_Day_1km – land surface temperature at 1 km resolution (NA)
- LST_Day_1km_Kelvin – land surface temperature converted to Kelvin (Kelvin)
- system:index – MODIS image ID containing acquisition date (NA)
- Latitude – EPSG 4326 latitude (decimal degrees)
- Longitude – EPSG 4326 longitude (decimal degrees)
- L2_SLSTR_extraction.csv – Sentinel-3 SLSTR sea and land surface temperature
- Date – date as yyyy/mm/dd (NA)
- Time – acquisition time in seconds since midnight UTC (seconds)
- Latitude – EPSG 4326 latitude (decimal degrees)
- Longitude – EPSG 4326 longitude (decimal degrees)
- Temp – Sentinel-3 SLSTR temperature (Kelvin)
- L2_OLCI_extraction_v2_EV.csv – Sentinel-3 OLCI Level-2 products
- Date – date as yyyy/mm/dd (NA)
- Time – acquisition time in seconds since midnight UTC (seconds)
- Latitude – EPSG 4326 latitude (decimal degrees)
- Longitude – EPSG 4326 longitude (decimal degrees)
- ADG443 – absorption by colored dissolved organic material at 443 nm (m⁻¹)
- TSM – total suspended matter (g m⁻³)
- Chla – chlorophyll-a concentration (log-transformed, mg/L)
- PAR – photosynthetically active radiation (µEinstein m⁻²)
- Kd490 – diffuse attenuation coefficient at 490 nm (m⁻¹)
- boxplot_allDRMs.Rmd – R Markdown script for generating boxplots across all DRM sites
- Input_Analysis.Rmd – R Markdown script for initial data input and analysis
- L&O_ChamberTimeSeries.Rmd – R Markdown script for chamber flux time series plots
- L&O_CorrelationPlots.Rmd – R Markdown script for correlation plots between variables
- L&O_Kfold_fig.Rmd – R Markdown script for visualizing K-fold cross-validation results
- raster_timeseries.Rmd – R Markdown script for raster-based time series analysis
The sentinel3_ch4_model.zip is a python repository containing:
- CH4_ML_dataset_1d_aqua.csv - the in situ to satellite data matchups for ML
- Date - Date as m/d/yyyy
- Latitude - EPSG 4326 latitude (decimal degrees)
- Longitude - EPSG 4326 longitude (decimal degrees)
- Site - Sampling location for discrete samples (Lake – Location) or chambers (1, 2, 5, or 6)
- Temp - Water temperature (°C)
- rh - Relative humidity
- CH4_flux - CH₄ flux (µmol/m²/hr)
- Chla.x - In situ chlorophyll-a (µg/L)
- pH - In situ pH
- CI - Sentinel-3 OLCI cyanobacteria index
- Chla.y - Sentinel-3 OLCI chlorophyll-a (Log(mg/L))
- PAR - Sentinel-3 OLCI PAR (µEinstein/m²)
- Kd490 - Sentinel-3 OLCI diffuse attenuation coefficient (m⁻¹)
- rtoa1 - Sentinel-3 OLCI top of atmosphere radiance band 1 (W·m⁻²·sr⁻¹·nm⁻¹)
- rtoa2 - Sentinel-3 OLCI top of atmosphere radiance band 2 (W·m⁻²·sr⁻¹·nm⁻¹)
- rtoa3 - Sentinel-3 OLCI top of atmosphere radiance band 3 (W·m⁻²·sr⁻¹·nm⁻¹)
- rtoa4 - Sentinel-3 OLCI top of atmosphere radiance band 4 (W·m⁻²·sr⁻¹·nm⁻¹)
- rtoa5 - Sentinel-3 OLCI top of atmosphere radiance band 5 (W·m⁻²·sr⁻¹·nm⁻¹)
- rtoa6 - Sentinel-3 OLCI top of atmosphere radiance band 6 (W·m⁻²·sr⁻¹·nm⁻¹)
- rtoa7 - Sentinel-3 OLCI top of atmosphere radiance band 7 (W·m⁻²·sr⁻¹·nm⁻¹)
- rtoa8 - Sentinel-3 OLCI top of atmosphere radiance band 8 (W·m⁻²·sr⁻¹·nm⁻¹)
- rtoa9 - Sentinel-3 OLCI top of atmosphere radiance band 9 (W·m⁻²·sr⁻¹·nm⁻¹)
- rtoa10 - Sentinel-3 OLCI top of atmosphere radiance band 10 (W·m⁻²·sr⁻¹·nm⁻¹)
- rtoa11 - Sentinel-3 OLCI top of atmosphere radiance band 11 (W·m⁻²·sr⁻¹·nm⁻¹)
- rtoa12 - Sentinel-3 OLCI top of atmosphere radiance band 12 (W·m⁻²·sr⁻¹·nm⁻¹)
- ADG443 - Sentinel-3 OLCI colored dissolved organic material (m⁻¹)
- TSM - Sentinel-3 OLCI total suspended matter (g/m³)
- pr - gridMET daily precipitation (mm)
- pr3 - gridMET cumulative 3-day precipitation (mm)
- pr5 - gridMET cumulative 5-day precipitation (mm)
- pr7 - gridMET cumulative 7-day precipitation (mm)
- tmmx - gridMET daily maximum air temperature (K)
- vs - gridMET daily wind speed (m/s)
- Lake - Sampled lake
- water_temp_K - MODIS or air2water lake surface temperature (K)
- sim_water - Simulated lake surface temperature output from air2water (K)
- MODIS_aqua_FINALV.csv - the surface temperature data from MODIS-Aqua that is used to create multiband rasters for model application
- LST_Day_1km - MODIS-Aqua land surface temperature
- YEAR - Year
- MONTH - Month
- DAY - Day
- latitude - EPSG 4326 latitude (decimal degrees)
- longitude - EPSG 4326 longitude (decimal degrees)
- fid_1 - Lake ID
- ml_model.ipynb is the Jupyter notebook where model training and analysis is conducted
- multiband_raster_creator.py - contains the code to create and stack raster products
- model_application.py - contains the code to apply the fully trained ETR ML model to the multiband rasters
- etr_final2.joblib - the dumped, fully trained ETR model
- ALL_DRMs - shapefile of the drowned river mouth estuaries along the Lake Michigan coast, used to clip rasters
- requirements.txt - contains all the python packages and versions for the repository
- .gitignore - a list of files to be ignored in committing github repository changes
Code/Software
This project used Python v3.12.3 and R v4.5.1
