Data from: TC-GEN: Data-driven tropical cyclone downscaling using machine learning-based high-resolution weather model

Jing, Renzhi 1 ; Gao, Jianxiong2; Cai, Yunuo2; Xi, Dazhi3; Zhang, Yinda4; Fu, Yanwei2; Emanuel, Kerry5; Diffenbaugh, Noah1; Bendavid, Eran1

Published Sep 16, 2024 on Dryad. https://doi.org/10.5061/dryad.t1g1jwtbg

Abstract

Synthetic downscaling of tropical cyclones (TCs) is critically important to estimate the long-term hazard of rare high-impact storm events. Existing downscaling approaches rely on statistical or statistical-deterministic models that are capable of generating large samples of synthetic storms with characteristics similar to observed storms. However, these models do not capture the complex two-way interactions between a storm and its environment. In addition, these approaches either necessitate a separate TC size model to simulate storm size or involve post-processing to capture the asymmetries in the simulated surface wind. In this study, we present an innovative data-driven approach for TC synthetic downscaling. Using a machine learning-based high-resolution global weather model (ML-GWM), our approach can simulate the full life cycle of a storm with asymmetric surface wind that accounts for the two-way interactions between the storm and its environment. This approach consists of multiple components: a data-driven model for generating synthetic TC seeds, a blending method that seamlessly integrates storm seeds into the surrounding while maintaining the seed structure, and a model based on a recurrent neural network to correct for biases in storm intensity. Compared to observations and synthetic storms simulated using existing statistical-deterministic and statistical downscaling approaches, our method shows the ability to effectively capture many aspects of TC statistics, including track density, landfall frequency, landfall intensity, and outermost wind extent. Leveraging the computational efficiency of ML-GWM, our approach shows substantial potential for TC regional hazard and risk assessment.

Replication materials for Jing et al. (2024).

The materials in this repository reproduce the figures, tables, and calculations appearing in the main text and supplement material of the paper.

If you find meaningful errors in the code or have questions or suggestions, please contact Renzhi Jing at jingrenzhi.go@gmail.com

Organization of repository

scripts: scripts for replication of figures and calculations.
data: output data that are used to generate figures

Data

data/fig1_data.mat
- selected case to illustrate the basic concept of TC-GEN. The data includes four variables: predictors (the simulated environment map), Intensity (the storm’s intensity), location (the storm’s latitude and longitude), and time (52 snapshots representing different time points)
data/fig2_data.mat
- Selected simulated storms to demonstrate the effectiveness of PCA analysis. The data includes four variables: recon_50, recon_100, recon_500, and original_data. These represent environmental fields reconstructed using 50, 100, and 500 principal components, along with the raw fields.
data/fig3_data.mat
- Selected synthetic tropical cyclone seeds using 500 principal components. The data is structured as [20, 4, 150, 150], where ‘20’ represents the number of storms, ‘4’ represents the variables (in the order of mean sea level pressure, u wind, v wind, and temperature), and ‘150 x 150’ denotes the spatial field dimensions.
data/fig4_data.mat
- Selected cases to show the idea of Poisson blending, compared to the basic copy-and-paste method. The data includes four variables: src_data (synthetic storm seeds), tgt_data (target environmental fields), paste_data (results from the copy-and-paste method), and blend_data (results from Poisson blending).
data/fig7
- fig7_ibt_tracks.csv historical tropical cyclone tracks between 1979 and 2014, derived from IBTrACS dataset. The columns include: sid (storm ID), year, month, day, hour (storm time), lat (storm latitude), lon (storm longitude), vmax (maximum wind intensity in knots), and isocean (a flag indicating whether the storm is over the ocean).
- fig7_ke08_simulation.csv simulated storms using KE08 method. The columns are same with fig7_ibt_tracks.csv.
- fig7_pepc_10_simulation.csv simulated storms using PepC method. The columns are same as those in fig7_ibt_tracks.csv.
- fig7_tcgen_hourly.csv simulated storms using TC-GEN hourly method. The columns include: SID (storm id), simulated_time (time of simulated storms), simulated_lat (latitude of simulated storms), simulated_lon (longitude of simulated storms), simulated_wind (maximum wind speed of simulated storms), bias_correct_wind (bias-corrected wind speed), and isocean (a flag indicating whether the storm is over the ocean).
- fig7_tcgen_monthly.csv simulated storms using TC-GEN monthly method. The columns are same as those in fig7_tcgen_hourly.csv.
- fig7_tcgen_hourly_resampling.csv simulated storms using TC-GEN hourly method with genesis locations resampled according to historical distribution patterns. The columns are same as those in fig7_tcgen_hourly.csv.
data/fig9
- fig9_ibt_tracks_1900_2022 historical tropical cyclone tracks between 1900 and 2022, derived from IBTrACS dataset. The columns are same with those in fig7_ibt_tracks.csv.
- fig9_ibt_landfall_sampling_errors landfall sampling errors of historical North Atlantic tropical cyclones at each of the 185 gates. The columns include: gate_id (gate id), ibt_freq (landfall frequency), ibt_se (standard error of landfall frequency).
- fig9_landfall_stats landfall statistics of historical storms and simulated storms. The columns are:
  - gate_id: gate Iid
  - *_freq: landfall frequency for each dataset, where columns starting with ke06 represent simulated storms using the KE08 method, columns starting with simu represent simulated storms using the TC-GEN hourly method, and simu_monthly represents simulated storms using the TC-GEN monthly method.
  - pepc_*: landfall frequency of simulated storms using the PepC method. We provide results from 10 independent simulations.
data/fig11
- fig11_snp.mat snapshot of selected storm to show the concept of TC outer size.
- fig11_tc_size_era5.csv historical tropical cyclone size extracted from ERA5 reanalysis. The columns include: sid (storm id), time (storm time), r2 (outer size at 2m/s wind level), r6 (outer size at 6m/s wind level), and r8 (outer size at 8m/s wind level).
- fig11_tc_size_simulation.csv outer size of simulated historical tropical cyclones using TC-GEN hourly method. The columns are same with those in fig11_tc_size_era5.csv.
data/world_countries
- Country boundaries from IPUMS, the raw data could be downloaded at: https://international.ipums.org/international/gis.shtml.

Scripts

R files (plot_Figure*.R) generate the figures in the paper and write them to ./figures/. The figures produced by these scripts might look slightly different from the published versions due to post-processing in Adobe Illustrator.

Code/Software

Scripts were written in Python 3.6.1 and R 4.2.3.

Data from: TC-GEN: Data-driven tropical cyclone downscaling using machine learning-based high-resolution weather model

Data files

Abstract

README: Data from: TC-GEN: Data-driven tropical cyclone downscaling using machine learning-based high-resolution weather model

Organization of repository

Data

Scripts

Code/Software

Works referencing this dataset