Data archive for: Exploring the use of machine learning to improve vertical profiles of temperature and moisture
Data files
Oct 31, 2023 version files 153.12 MB
-
mlsoundings_dataset.nc
153.12 MB
-
README.md
1.94 KB
Abstract
Vertical profiles of temperature and dewpoint are useful in predicting deep convection that leads to severe weather that threatens property and lives. Currently, forecasters rely on observations from radiosonde launches and numerical weather prediction (NWP) models. Radiosonde observations are, however, temporally and spatially sparse, and NWP models contain inherent errors that influence short-term predictions of high-impact events. This work explores using machine learning (ML) to postprocess NWP model forecasts, combining them with satellite data to improve vertical profiles of temperature and dewpoint. We focus on different ML architectures, loss functions, and input features to optimize predictions. Because we are predicting vertical profiles at 256 levels in the atmosphere, this work provides a unique perspective at using ML for 1-D tasks. Compared to baseline profiles from the Rapid Refresh (RAP), ML predictions offer the largest improvement for dewpoint, particularly in the mid- and upper-atmosphere. emperature improvements are modest, but CAPE values are improved by up to 40%. Feature importance analyses indicate that the ML models are primarily improving incoming RAP biases. While additional model and satellite data offer some improvement to the predictions, architecture choice is more important than feature selection in fine-tuning the results. Our proposed deep residual UNet performs the best by leveraging spatial context from the input RAP profiles; however, the results are remarkably robust across model architecture. Further, uncertainty estimates for every level are well-calibrated and can provide useful information to forecasters.
This dataset contains a combination of four data sources that allow for the training and testing of using machine learning to predict temperature and dewpoint vertical profiles. The data consists of matched Radiosonde Observations (RAOB), Rapid Refresh (RAP) output, Real-Time Mesoscale Analysis (RTMA), and Geostationary Operational Environmental Satellite (GOES)-16 data at sites over the central U.S. Tornado Alley from January 2017 through May 2020. All data sources have been collocated to corresponding site, height (256 vertical levels), and time.
Description of the data
The data are in a single netCDF file, mlsoundings_dataset.nc. The file consists of five different groups. The groups correspond to the different data sources, with an additional group that has the indices that were used for training, validation, and testing.
Sharing/Access information
These data were derived from the following sources:
- Radiosonde Observations are from the National Oceanic and Atmospheric Administration (NOAA) Earth System Research Laboratory radiosonde archive (https://ruc.noaa.gov/raobs/)
- The Rapid Refresh (RAP) are from NOAA Global Systems Laboratory Assimilation and Verification Innovation Division (https://rapidrefresh.noaa.gov/)
- The Real-Time Mesoscale Analysis (RTMA) data are from NOAA National Centers for Environmental Prediction (NCEP) Central Operations (https://www.nco.ncep.noaa.gov/pmb/products/rtma/ and https://noaa-rtma-pds.s3.amazonaws.com/index.html)
- The satellite data are from the Advanced Baseline Imager (ABI) onboard the Geostationary Operational Environmental Satellite (GOES)-16 and are available from NOAA and NASA on Amazon Web Services (https://www.ncei.noaa.gov/products/goes-terrestrial-weather-abi-glm and https://noaa-goes16.s3.amazonaws.com/index.html)
This dataset was collected for the corresponding publication in Artificial Intelligence for the Earth Systems, and the processing methodology is outlined in that publication.