Skip to main content
Dryad

Data from: Observation definitions and their implications in machine learning-based predictions of excessive rainfall

Data files

Oct 07, 2024 version files 35.53 GB

Select up to 11 GB of files for download

Abstract

The implications of definitions of excessive rainfall observations on machine learning-model forecast skill is assessed using the Colorado State University Machine Learning Probabilities (CSU-MLP) forecast system. The CSU-MLP uses historical observations along with reforecasts from a global ensemble to train random forests to probabilistically predict excessive rainfall events. Here, random forest models are trained using two distinct rainfall datasets, one that is composed of fixed-frequency (FF) average recurrence intervals exceedances and flash flood reports, and the other a compilation of flooding and rainfall proxies (Unified Flood Verification System; UFVS). Both models generate 1-3 day forecasts and are evaluated against a climatological baseline to characterize their overall skill as a function of lead time, season, and region. Model comparisons suggest that regional frequencies in excessive rainfall observations contribute to when and where the ML models issue forecasts, and subsequently their skill and reliability. Additionally, the spatio-temporal distribution of observations have implications for ML model training requirements, notably, how long of an observational record is needed to obtain skillful forecasts. Experiments reveal that shorter-trained UFVS-based models can be as skillful as longer-trained FF-based models. In essence, the UFVS dataset exhibits a more robust characterization of excessive rainfall and impacts, and machine learning models trained on more representative datasets of meteorological hazards may not require as extensive training to generate skillful forecasts.