Progress toward forecasting excessive rainfall with random forests based on a deterministic convection-allowing model
Data files
Nov 25, 2025 version files 110.04 MB
-
control.zip
11.41 MB
-
grid_stat_2020.zip
1.08 MB
-
grid_stat_2021.zip
2.80 MB
-
grid_stat_2022.zip
2.73 MB
-
grid_stat_2023.zip
2.80 MB
-
grid_stat_2024.zip
2.82 MB
-
grid_stat_2025.zip
826.37 KB
-
hrrr2021_mismatch.zip
1.73 MB
-
hrrr2021.zip
24.82 MB
-
hrrr2022.zip
17.45 MB
-
hrrr2023.zip
7.44 MB
-
mean_max.zip
11.39 MB
-
mean.zip
11.42 MB
-
README.md
4.80 KB
-
tl.zip
11.34 MB
Abstract
This dataset consists of forecasts produced by random forests (RFs) using predictor information from NOAA's deterministic convection-allowing numerical weather prediction model, the High-Resolution Rapid Refresh. Included are sensitivity experiments on predictor assembly and model version, as well as real-time forecasts from three subsequent versions evaluated at the Weather Prediction Center's Flash Flood and Intense Rainfall Experiment (FFaIR) during 2021-2023. Sensitivity experiments reveal that the RF performs better when we use predictor information from all model gridpoints, not just sparse gridpoints, particularly in situations with small-scale precipitation maxima in the model forecast. The RF is also better able to learn the relationships between predictor values and resulting excessive rainfall risk when the RF considers mean predictors from three model simulations rather than predictors from a single simulation. The real-time RFs evaluated at FFaIR exhibited year-over-year improvements stemming from the results of these sensitivity experiments as well as feedback from FFaIR participants. However, RFs based on deterministic convection allow models to continue to underperform those based on coarse global ensemble systems.
Dataset DOI: 10.5061/dryad.2z34tmpzp
Description of the data and file structure
The dataset includes zip files which contain daily random forest (RF) excessive rainfall forecasts in GRIB2 format, as well as files containing daily verification data and official excessive rainfall outlooks from the Weather Prediction Center (WPC). This project aimed to test various ways to construct random forests for predicting excessive rainfall based on predictors derived from deterministic convection-allowing numerical weather prediction model forecasts. The forecast model used is the NOAA High-Resolution Rapid Refresh (HRRR), with 3-km grid spacing (see the references below for more information).
https://doi.org/10.1175/WAF-D-21-0151.1
https://doi.org/10.1175/WAF-D-21-0130.1
Forecast zip files contain forecasts in GRIB2 format. Gridded forecasts are encoded in values between 0 and 1, indicating probabilities of excessive rainfall from 0 to 100%. GRIB2 file naming convention includes a date stamp YYYYMMDD00, which indicates the initialization time of the HRRR forecast used to provide predictors to the RF. The forecast is valid for the 24-h period from 1200 UTC to 1200 UTC. For example, a file named EXQPF_DAY1_NSSL_PROBS_V1_2024042400.grib2 uses predictors derived from the HRRR forecast initialized at 0000 UTC 24 Apr 2024, and contains a gridded forecast valid for the 24-h period from 1200 UTC 24 Apr - 1200 UTC 25 Apr 2024.
The grid_stat files, which contain verification data and official WPC excessive rainfall outlooks (EROs), have a date stamp which explicitly indicates the valid time. For example, the file named grid_stat_ALL_ERO_s2024042412_e2024042512_vhr09_240000L_20240425_120000V_pairs.nc is valid for the 24-h period from 1200 UTC 24 Apr - 1200 UTC 25 Apr 2024. The files are in netCDF format. The variable named FCST_ERO_Surface contains the 0900 UTC day one WPC ERO valid for the indicated time period, again with values between 0 and 1 indicating probabilities of excessive rainfall from 0 to 100%. The variable beginning with OBS_ALL contains the verification dataset derived from the Unified Flood Verification System (UFVS), described in more detail in this reference:
https://doi.org/10.1175/WAF-D-20-0020.1
Files and variables
File: control.zip
Description: Daily random forest (RF) forecasts from the control experiment (CTRL)
File: mean.zip
Description: Daily RF forecasts from the MEAN experiment (using the spatial mean of predictor values)
File: mean_max.zip
Description: Daily RF forecasts from the MEAN_MAX experiment (using spatial max/min of storm attribute predictors, and spatial mean of environmental predictor values)
File: tl.zip
Description: Daily RF forecasts from the MEAN_MAX_TL experiment, using the mean of predictor values across three HRRR simulations (0000, 0600, 1200 UTC)
File: hrrr2021.zip
Description: Daily RF forecasts from the 2021 version of the HRRR-based RF, evaluated at the Flash Flood and Intense Rainfall (FFaIR) Experiment during 2021.
File: hrrr2022.zip
Description: Daily RF forecasts from the 2022 version of the HRRR-based RF, evaluated at the 2022 FFaIR experiment.
File: hrrr2023.zip
Description: Daily RF forecasts from the 2023 version of the HRRR-based RF, evaluated at the 2023 FFaIR experiment.
File: hrrr2021_mismatch.zip
Description: Daily RF forecasts from the 2021 version of the HRRR-based RF, trained on HRRRv3 but applied to HRRRv4 forecasts during Aug - Dec 2020.
File: grid_stat_2020.zip
Description: Daily verification data for 2020 from the Unified Flood Verification System (UFVS), as well as day one excessive rainfall outlooks (EROs) issued at 0900 UTC by the Weather Prediction Center (WPC)
File: grid_stat_2021.zip
Description: Daily verification data and EROs for 2021.
File: grid_stat_2025.zip
Description: Daily verification data and EROs for 2025.
File: grid_stat_2023.zip
Description: Daily verification data and EROs for 2023.
File: grid_stat_2022.zip
Description: Daily verification data and EROs for 2022.
File: grid_stat_2024.zip
Description: Daily verification data and EROs for 2024.
Access information
Other publicly accessible locations of the data:
- none
Data was derived from the following sources:
- HRRR data available at several sites, outlined at https://rapidrefresh.noaa.gov/hrrr/
