Environment catalogues of satellite-tracked tropical mesoscale convective systems from reanalysis
Data files
Apr 09, 2026 version files 128.27 GB
-
MCS_FLEXTRKR_Tropics_ERA5env_2001.2005.tar.gz
25.26 GB
-
MCS_FLEXTRKR_Tropics_ERA5env_2006.2010.tar.gz
32.14 GB
-
MCS_FLEXTRKR_Tropics_ERA5env_2011.2015.tar.gz
33.92 GB
-
MCS_FLEXTRKR_Tropics_ERA5env_2016.2020.tar.gz
34.91 GB
-
MCSenv_extract_ERA5data.tar.gz
2.03 GB
-
README.md
13.42 KB
Apr 09, 2026 version files 128.27 GB
-
MCS_FLEXTRKR_Tropics_ERA5env_2001.2005.tar.gz
25.26 GB
-
MCS_FLEXTRKR_Tropics_ERA5env_2006.2010.tar.gz
32.14 GB
-
MCS_FLEXTRKR_Tropics_ERA5env_2011.2015.tar.gz
33.92 GB
-
MCS_FLEXTRKR_Tropics_ERA5env_2016.2020.tar.gz
34.91 GB
-
MCSenv_extract_ERA5data.tar.gz
2.03 GB
-
README.md
13.41 KB
Abstract
A 20-year dataset of MCS-environment catalogues is provided through the integration of advanced ERA5 reanalysis, satellite observations, and tropical mesoscale convective systems (MCSs) identified using the Python FLEXible object TRacKeR (PyFLEXTRKR). The catalogues contain key thermodynamic variables, surface pressure, precipitation, and brightness temperature sampled throughout the lifecycle of each identified MCS. A low-tropospheric buoyancy metric—representing a critical driver of moist convection—is also derived from ERA5 and included. All outputs are stored in NetCDF format. To support user customization, a Python-based software package is provided, enabling users to include additional variables of interest or generate catalogues based on alternative MCS tracks.
Dataset DOI: 10.5061/dryad.98sf7m0w2
Description of the data and file structure
A 20-year dataset of MCS-environment catalogues is provided through the integration of advanced ERA5 reanalysis, satellite observations, and tropical mesoscale convective systems (MCSs) identified using the Python FLEXible object TRacKeR (PyFLEXTRKR). The catalogues contain key thermodynamic variables, surface pressure, precipitation, and brightness temperature sampled throughout the lifecycle of each identified MCS (6 life stages: Identification, Initial, Growing, Mature, Decaying, Termination). A low-tropospheric buoyancy metric—representing a critical driver of moist convection—is also derived from ERA5 and included. All outputs are stored in NetCDF format. To support user customization, a Python-based software package is provided, enabling users to include additional variables of interest or generate catalogues based on alternative MCS tracks.
Files and variables
File: MCSenv_extract_ERA5data.tar.gz
Description: The Python code used to process the provided MCS environmental variables of ERA5 from 2001 to 2020
File: MCS_FLEXTRKR_Tropics_ERA5env_2001.2005.tar.gz; MCS_FLEXTRKR_Tropics_ERA5env_2006.2010.tar.gz; MCS_FLEXTRKR_Tropics_ERA5env_2011.2015.tar.gz;MCS_FLEXTRKR_Tropics_ERA5env_2016.2020.tar.gz
Description: A subset of MCS environmental variables for tropical MCS tracks observed in individual years. The corresponding MCS tracks and the extracted reanalysis/satelite gridded variables (see the table below) around the MCS centroid with time are saved under the year folders. The following shows the directory structure of year 2001:
├── mcs_tracks_pyflextrkr.2001.tropics30NS.nc
├── environment_catalogs
├── VARS_2D: MCS_FLEXTRKR_tropics.VAR_2D.merged.nc; MCS_FLEXTRKR_tropics.VAR_2T.merged.nc; MCS_FLEXTRKR_tropics.precipitation.merged.nc; MCS_FLEXTRKR_tropics.SP.merged.nc; MCS_FLEXTRKR_tropics.mpr.merged.nc; MCS_FLEXTRKR_tropics.tb.merged.nc
├── VARS_3D: MCS_FLEXTRKR_tropics.q.merged.nc; MCS_FLEXTRKR_tropics.T.merged.nc
├── VARS_derived: MCS_FLEXTRKR_tropics.buoyancy.merged.nc
| Category | Description |
|---|---|
| Name | Environment catalogues of tropical mesoscale convective systems |
| Time / Space Span | January 2001 – December 2020; Tropical region (30°S–30°N) |
| Spatial & Temporal Resolutions | Variable dimensions: (tracks, stage, level, x, y), where stage = 6; x = y = 41 grid points Spatial information: 10° × 10° box centered at the tracked MCS; horizontal grid spacing = 0.25° Temporal information: 6 MCS life stages (Identification, Initial, Growing, Mature, Decaying, Termination) |
| Data Format | NetCDF (version 4.9.1) |
| Data Size | Total variables: ~7.5 GB per year 3-D variables: ~3.5 GB per file 2-D variables: ~150 MB per file |
| Variables – ERA5 Reanalysis | 3-D (27 levels) and 2-D (single level): • Temperature (T) • Specific humidity (q) • 2-meter temperature (2t) • 2-meter dewpoint temperature (2d) • Surface pressure (sp) • Total precipitation (tmpr) |
| Variables – GPM-IMERG V06 Final | • Precipitation (precipitation) |
| Variables – CPC MERGED-IR | • Brightness temperature (tb) |
| Variables – PyFLEXTRKR | • MCS track number (cloudtracknumber_nomergesplit) |
| Derived Variables (from ERA5) | • Total low-tropospheric buoyancy (Buoy_TOT) • Low-tropospheric buoyancy, undiluted CAPE (Buoy_CAPE) • Low-tropospheric buoyancy, subsaturation (Buoy_SUBSAT) |
The derived buoyancy varaibles, Buoy_TOT, Buoy_CAPE and Buoy_SUBSAT, follow the definitions described in Ahmed and Neelin (2021) as empirical estimates of total buoyancy and its contributions from undilute buoyancy and diluation from entrainment of surronding dry air, respectively.
Ahmed, F., & Neelin, J. D. (2021). A process‐oriented diagnostic to assess precipitation‐thermodynamic relations and application to CMIP6 models. Geophysical Research Letters, 48(14), e2021GL094108.
Code/software
A Python-based module is provided to reproduce the full set of environment catalogues with customizable settings, such as the size of the analysis window, output format, and the inclusion of additional user-specified variables beyond the default set. Please see the document for details.
The directory of the zipped file for the software is shown as
├── config
├── config_README
├── feature_list.jsonc
├── varible_list.jsonc
├── dataset
├── MCS_FLEXTRKR_example
├── 2020
├── environment_catalogs
├── VARS_2D
├── VARS_3D
├── VARS_derived
├── feature_catalogs
├── track
├── track_geoinfo.nc
├── input_env
├── 2020
├── era5.2d.2020.01.nc
├── era5.2t.2020.01.nc
├── era5.mtpr.2020.01.nc
├── era5.q.2020.01.nc
├── era5.sp.2020.01.nc
├── era5.T.2020.01.nc
├── gpm-imerg.mcsmask.2020.01.nc
├── gpm-imerg.precipitation.2020.01.nc
├── gpm-imerg.tb.2020.01.nc
├── input_tracks
├── mcs_tracks_example.nc
├── runscripts
├── feature_environment_module.py
├── run_feature_environment.py
Data Preparation
Input feature tracks (directory: /input_tracks)
- mcs_tracks_example.nc: This NetCDF file contains time and location information for each tracked feature. The variables meanlon and meanlat represent the longitude and latitude of the object centroid, and base_time provides the timestamp of the feature (set to NaT if unavailable). Users should follow this variable nomenclature when preparing their own track files to ensure compatibility with the extraction scripts.
Input environmental variables (directory: /input_env)
- era5.T.2020.01.nc (for example): This is a latitude–longitude gridded NetCDF file containing environmental variables (e.g., 3-D air temperature from ERA5 reanalysis here). In the current version, all environmental variables must be provided as monthly NetCDF files, consistent with the standard output format when downloading ERA5 data via the ECMWF CDSAPI.
The extraction script reads the timestamp of each track and matches it to the corresponding year and month encoded in the filename (e.g., 2020.01). Users are strongly encouraged to adopt a clear and consistent naming convention that explicitly includes the data source, variable name, year, and month, such as:
era5.temperature.2025.05.nc
Configuration (directory: /config)
Two configuration files in JSONC format are used to control the extraction procedure.
- feature_list.jsonc
This file defines the feature tracks, spatial extraction window, and output settings.
{
"feature": [
{
"name": "MCS_FLEXTRKR_example",
"track_data": "/environment_catalog/input_tracks/mcs_tracks_example.nc",
"box_size_degree": "10",
"is_track_time_fixed": "False"
}
]
}
- variable_list.jsonc
set up the input data of environmental variables to be extracted and stored for individual tracks.
{
"variable_inputs": [
{
"var_name": "T",
"varname_infile": "t",
"var_dir": "/environment_catalog/input_env",
"file_str": "era5.XX.YYYY.MM.nc"
}
]
}
The extraction procedure loops through the MCS tracks and processes all variables specified in variable_list.jsonc. To ensure proper input recognition, please name your files clearly with the variable name and time format explicitly indicated.
Run the extraction along tracks (under /runscripts )
Once data preparation and configuration are complete, users can initiate the environment catalogue extraction by running “python run_feature_environment.py”. Upon execution, a new folder will be created under the /dataset/ directory to store the output files. By default, all input variables are interpolated to a common grid with 0.25° spacing, matching the ERA5 lat-lon coordinate, to ensure consistency across datasets. Users may customize the latitude–longitude grid by modifying the corresponding settings in run_feature_environment.py to meet specific application needs. The script utilizes multiprocessing by default, employing 8 processors for parallel computation. This setting can be adjusted within the script to accommodate different system configurations.
Test with the example data
The test data are provided in the "input_env" and "input_tracks" directories, with the default configure files set up. Upon execution by “python run_feature_environment.py”, the outputs will be stored under "VARS_2D" and "VARS_3D" directories. This test should be finished in 4-5 minutes.
Access information
Other publicly accessible locations of the data:
The full dataset can be accessed from the NERSC Science Gateway using following link
Data was derived from the following sources:
- ERA5 reanalysis doi.org/10.1002/qj.3803
- GPM-IMERG doi.org/10.5067/GPM/IMERG/3B-HH/06
- PyFLEXRLR Global MCS Tracking Dataset using GPM MergedIR Tb and IMERG precipitation data doi.org/10.5281/zenodo.10023498
Tropical MCS tracks spanning 20 years (2001–2020) are obtained from the existing PyFLEXTRKR dataset (Feng et al., 2021,2023), where MCSs are identified and tracked using high-resolution satellite imagery and precipitation estimates from GPM-IMERG (Huffman et al., 2015). These tracks provide detailed spatial and temporal information—including the location, timing, and lifecycle evolution of cloud and precipitation features—which serves as the basis for extracting concurrent environmental and convective properties from coordinated data sources. A 10° × 10° box is used to extract the environmental variables, providing a domain generally large enough to capture both the MCS and its surrounding environment. This box is centered on the ERA5 grid point closest to the tracked MCS centroid at each time step. As a result, ERA5 variables are preserved on their native grid, while other variables, such as precipitation and brightness temperature, are interpolated to match the ERA5 grid at 0.25° resolution for consistency. These spatial subsets constitute the “environment catalogues.”
Six time records for individual MCS tracks are considered to represent different life stages of an MCS, including the Identification, Initial, Growing, Mature, Decaying, and Termination. The Identification stage refers to the time when a cloud object is first detected by PyFLEXTRKR. The Initial stage marks the transition of this object into an MCS, based on specific criteria related to cloud size and precipitation characteristics—such as cold cloud area, presence of convective cores, and precipitating area. The Mature stage corresponds to the time of peak total precipitation produced by the MCS. The Termination stage is defined when the system no longer meets the MCS classification criteria. The Growing and Decaying stages are assigned to the midpoints between the Initial and Mature and the Mature and Termination, respectively. By integrating spatial (10-degree box, 0.25-degree resolution) and temporal information (6 MCS life stages), the environmental catalogues of tropical MCSs are structured in a data array with dimensions (time: 6, x: 41, y: 41, level: 37) for each 3D variable. See the provided document for more details.
