Precipitation identifiers for meteorological features combining global GPM-IMERG retrievals and ERA5 reanalysis
Data files
Oct 11, 2024 version files 1.72 GB
-
GPM-IMERG_feature_precip_identifiers.zip
1.72 GB
-
README.md
3.08 KB
Jun 13, 2026 version files 52.10 GB
-
GPM-IMERG_feature_precip_identifiers.zip
1.72 GB
-
IMERG_feature_precip_identifiers.tar.gz
50.38 GB
-
README.md
7.55 KB
Abstract
The data collection provides 0.25-degree, 6-hourly global feature-precipitation categories. The data are generated by merging IMERG observational rainfall (V6 and V7 final versions) with atmospheric features identified by multiple object-based algorithms. The IMERG V06 version covers 2001–2019, while the IMERG V07 version covers 2001–2023. Classified precipitation identifiers include rainfall associated with atmospheric rivers (AR), frontal systems (FT), low-pressure systems (LPS), mesoscale convective systems (MCS), and their co-occurrences (overlapping areas of features at a given time). In addition to algorithm-identified features, precipitation contributed from deep convection, non-deep convection, stratiform, and drizzle are pixel-wise defined using thresholds of CPC MERGE-IR brightness temperature and GPM-IMERG rain rate. The dataset is supported by the Department of Energy and Environment (DOEE): DE-SC0023244.
https://doi.org/10.5061/dryad.v9s4mw73g
A 19-year (2001-2019) global dataset of IMERGV06 feature-associated precipitation identifiers and a 23-year (2001-2023) dataset based on IMERGV07 precipitation. The precipitation identifier is generated to attribute a rainy grid to a specific meteorological feature.
Description of the data and file structure
Data description
The data collection, IMERG_feature_precip_identifiers.tar.gz, provides 0.25-deg., 6-hourly global feature-precipitation categories. The data is generated by merging IMERG observational rainfall (V6 final version and V7 final version) and atmospheric features identified by multiple object-based algorithms. The IMERG V06 version covers 2001–2019, while the IMERG V07 version covers 2001–2023. Classified precipitation identifiers include rainfall associated with atmospheric rivers (AR), frontal systems (FT), low-pressure systems (LPS), mesoscale convective systems (MCS), and their co-occurrences (overlapping areas of features at a given time). In addition to algorithm-identified features, precipitation contributed from deep convection, non-deep convection, stratiform, and drizzle are pixel-wise defined using thresholds of CPC MERGE-IR brightness temperature and GPM-IMERG rain rate. The dataset is supported by the Department of Energy and Environment (DOEE): DE-SC0023244.
Note: The previous version, GPM-IMERG_feature_precip_identifiers.zip, is now included in the latest update under the directory name precip_feature_IMERGv6, with additional variables added to the dataset. Please refer to the log history for details on the updates.
File information
The compressed file contains two directories: precip_feature_IMERGv6 and precip_feature_IMERGv7. Each includes files named "precip_feature_identifiers" and "pr_feature_merged". Variables provided in these files are described below:
File name: "precip_feature_identifiers.{month}.6h.nc"
File type: NetCDF (network Common Data Form)
Variable description:
precip_id [time, latitude, longitude]: An integer (1-20) corresponding to a classified precipitation identifier.
The 20 classified precipitation identifiers are described in the attribute of NetCDF and the supplementary document.
dimensions:
time = 124 ;
longitude = 1440 ;
latitude = 481 ;
variables:
int64 time(time) ;
time:units = "hours since 2001-01-01" ;
time:calendar = "proleptic_gregorian" ;
double longitude(longitude) ;
longitude:_FillValue = NaN ;
longitude:long_name = "longitude" ;
longitude:short_name = "lon" ;
longitude:units = "degrees_east" ;
double latitude(latitude) ;
latitude:_FillValue = NaN ;
latitude:long_name = "latitude" ;
latitude:short_name = "lat" ;
latitude:units = "degrees_north" ;
byte precip_id(time, latitude, longitude) ;
File name: "pr_feature_merged.{month}.6h.nc"
File type: NetCDF (network Common Data Form)
Variable description:
The binary masks of the four identified features (ar_tag, ft_tag, mcs_tag, and lps_tag) and the interpolated hourly-averaged IMERG precipitation field (precipitation).
dimensions:
time = 124 ;
longitude = 1440 ;
latitude = 481 ;
variables:
int64 time(time) ;
time:units = "hours since 2001-01-01" ;
time:calendar = "proleptic_gregorian" ;
double longitude(longitude) ;
longitude:_FillValue = NaN ;
longitude:long_name = "longitude" ;
longitude:short_name = "lon" ;
longitude:units = "degrees_east" ;
double latitude(latitude) ;
latitude:_FillValue = NaN ;
latitude:long_name = "latitude" ;
latitude:short_name = "lat" ;
latitude:units = "degrees_north" ;
byte ar_tag(time, latitude, longitude) ;
ar_tag:model = "observation (ERA5 / IMERG)" ;
byte ft_tag(time, latitude, longitude) ;
ft_tag:units = "unitless" ;
ft_tag:long_name = "binary indicator of frontal system" ;
ft_tag:model = "observation (ERA5 / IMERG)" ;
ft_tag:description = "binary indicator of frontal system" ;
ft_tag:scheme = "Samson and Catto (2025)" ;
byte mcs_tag(time, latitude, longitude) ;
mcs_tag:model = "observation (ERA5 / IMERG)" ;
mcs_tag:regrid_resolution_deg = 0.25 ;
mcs_tag:regrid_method = "bilinear" ;
mcs_tag:source_lat_range = -59.95, 59.95 ;
mcs_tag:source_lon_range = 0.0499999999999545, 359.95 ;
mcs_tag:note = "MCS masks selected by filename hour at frequency 6h and regridded file-by-file to 0.25 degree grid." ;
mcs_tag:description = "binary indicator of mesoscale convective system" ;
mcs_tag:long_name = "binary indicator of mesoscale convective system" ;
mcs_tag:scheme = "PyFLEXTRKR" ;
mcs_tag:units = "unitless" ;
byte lps_tag(time, latitude, longitude) ;
lps_tag:description = "binary indicator of low pressure system" ;
lps_tag:long_name = "binary indicator of low pressure system" ;
lps_tag:scheme = "TempestExtremes" ;
lps_tag:units = "unitless" ;
float precipitation(time, latitude, longitude) ;
precipitation:_FillValue = NaNf ;
precipitation:regrid_method = "conservative" ;
precipitation:model = "observation (ERA5 / IMERG)" ;
precipitation:regrid_resolution_deg = 0.25 ;
precipitation:source_lat_range = -59.95, 59.95 ;
precipitation:source_lon_range = 0.0499999999999545, 359.95 ;
precipitation:note = "Precipitation selected by filename hour at frequency 6h and conservatively regridded file-by-file to 0.25 degree grid." ;
precipitation:description = "precipitation rate" ;
precipitation:long_name = "precipitation rate" ;
precipitation:source = "IMERG V07 Final version" ;
precipitation:remapping = "xesmf conservative interpolation" ;
precipitation:units = "mm/h" ;
Sharing / Access information
Global_GPM-IMERG_feature-associated_precipitation_categories.pdf- File uploaded to Zenodo has data description for Dryad uploading along with the figures.
Data of identified atmospheric features is archived at NERSC High Performance Storage System (HPSS) (/global/cfs/cdirs/m4374/catalogues/raw_catalogue_files/observations/)
Data of GPM-IMERG V6 Final version is archived at NASA GES DISC
Data of GPM-IMERG V7 Final version is archived at NASA GES DISC
Data of CPC MERGE-IR is archived at the NOAA Climate Prediction Center
The data of IMERG feature-associated precipitation identifiers is derived from the above sources.
The categorization of global precipitation relies on recognizing four primary atmospheric features: atmospheric rivers (ARs), fronts (FTs), mesoscale convective systems (MCSs), and low-pressure systems (LPSs). Initially, identified atmospheric features with varying temporal and spatial resolutions are harmonized into a unified framework (6-hourly and 0.25-degree). GPM-IMERG precipitation data (0.1-degree resolution) is then coarse-grained to 0.25-degree for labeling using merged feature outputs. Additionally, precipitation attributed to deep convection, non-deep convection, stratiform, and drizzle is discerned at the pixel level using MERGE-IR brightness temperature data alongside GPM-IMERG precipitation. These classifications exclusively apply to rainy pixels not aligned with the four primary features. Rainy pixels within a specific feature boundary are considered associated with that feature object. For frontal systems represented as line segments, the line-segment masks are expanded outward by 250 km to generate two-dimensional bounded features. The identification of precipitation sources is conducted independently every 6 hours over 19 years (2001-2019). Detailed methodologies and demonstrations are accessible at https://docs.google.com/document/d/1O8NQesgyjIXv2X37wLsZ1EhgBdRKtvtPR7OYNBSBdBs/edit
Changes after Oct 11, 2024:
A new version of the precipitation-feature identifier dataset has been uploaded as IMERG_feature_precip_identifiers.tar.gz. The compressed file contains two directories: precip_feature_IMERGv6 and precip_feature_IMERGv7. Users are encouraged to use the latest updated dataset, preferably the IMERG V07 version when applicable.
In this update, hourly averaged IMERG V06 precipitation and individual features masks have been added as additional data variables to meet users' needs. Following the release of IMERG V07, we have also generated a second version of the dataset using IMERG V07 precipitation, which extends the record to 2001–2023. In the IMERG V07 version, the precipitation field is replaced by IMERG V07, and the MCS feature masks are regenerated accordingly. The other three feature types remain unchanged between the two versions during their overlapping period, because they are identified from ERA5 variables rather than IMERG precipitation.
- Tsai, Wei‐Ming; Duan, Suqin; O’Brien, Travis A. et al. (2025). Co‐Occurring Atmospheric Features and Their Contributions to Precipitation Extremes. Journal of Geophysical Research: Atmospheres. https://doi.org/10.1029/2024jd041687
