Coarsened fine-grid model data for: A machine learning parameterization of clouds in a coarse-resolution climate model for unbiased radiation

Henn, Brian 1 ; Jauregui, Yakelyn2; Clark, Spencer1; Brenowitz, Noah3; McGibbon, Jeremy1; Watt-Meyer, Oliver1; Pauling, Andrew4; Bretherton, Christopher1

Published Jan 29, 2024 on Dryad. https://doi.org/10.5061/dryad.9p8cz8wpz

Abstract

Coarse-grid weather and climate models rely particularly on parameterizations of cloud fields, and coarse-grained cloud fields from a fine-grid reference model are a natural target for a machine-learned parameterization. We machine-learn the coarsened-fine cloud properties as a function of coarse-grid model state in each grid cell of NOAA's FV3GFS global atmosphere model with 200 km grid spacing, trained using a 3-km fine-grid reference simulation with a modified version of FV3GFS. The ML outputs are coarsened-fine fractional cloud cover and liquid and ice cloud condensate mixing ratios, and the inputs are coarse model temperature, pressure, relative humidity, and ice cloud condensate. The predicted fields are skillful and unbiased, but somewhat under-dispersed, resulting in too many partially-cloudy model columns. When the predicted fields are applied diagnostically (offline) in FV3GFS's radiation scheme, they lead to small biases in global-mean top-of-atmosphere (TOA) and surface radiative fluxes. An unbiased global-mean TOA net radiative flux is obtained by setting to zero any predicted cloud with grid-cell mean cloud fraction less than a threshold of 6.5%; this does not significantly degrade the ML prediction of cloud properties. The diagnostic, ML-derived radiative fluxes are far more accurate than those obtained with the existing cloud parameterization in the nudged coarse-grid model, as they leverage the accuracy of the fine-grid reference simulation's cloud properties.

This dataset provides the coarsened fine-grid model outputs needed to run the nudged coarse climate model, including running with prescribed coarsened fine-grid cloud fields and to train the ML model that predicts coarsened-fine cloud fields as functions of nudged coarse model state.

https://doi.org/10.5061/dryad.9p8cz8wpz

Contains the coarsened fine-grid model state files (restart files) for each 15 minute timestep over a 10-day simulation period. Allows for conducting a nudged run of a coarse model to the coarsened-fine model state, and for training machine learning with coarsend-fine cloud fields as targets. Also contains coarsened fine-grid model diagnostic fields such radiation that are used evaluation metrics in the manuscript.

The fine-grid model used to generate the coarsened-fine files here is NOAA GFDL X-SHiELD. The model used to run the coarse-grid simulations, including those with prescribed cloud (coarsened-fine grid cloud and ML cloud), is NOAA GFDL FV3GFS. Both models are based on the GFDL FV3 dynamical core, and use the RRTMG radiation and NOAA GFDL microphysics schemes. See https://doi.org/10.25923/6nhs-5897 and https://doi.org/10.25923/pz3c-8b96 for more information.

Description of the data and file structure

Contains two major sections:

A set of folders containing coarsened X-SHiELD model restart files (one folder for each 15-minute timestep over 10 days). Each folder contains files with the full 3D model state needed to nudge or restart a coarse-grid model. These are saved as netCDF files. These file formats are defined by the NOAA GFDL FV3GFS/SHiELD modeling environment and are expected by the model executable to be in this format in order to initialize the model. For upload purposes, the folders are grouped into tarballed archives by day. For example, for the timestep folder 20200731.001500:
- Each file is a netCDF, and prepended by the timestamp name
- Each file has a suffix (e.g., tile*) that indicates which tile on the 6-tile cubed-sphere global grid it corresponds to. So there are 6 files for each file type category.
- There are the following file type categories:
  - “fv_core.res” files: These contain the core 3-D prognostic variables of the model on its cubed-sphere, hybrid sigma/pressure vertical level grids, including u- and v- winds on the D- and A-grids, vertical winds, model layer height thicknesses, temperatures, model layer thicknesses, and surface geopotential height. Variables: ‘u’, ‘v’, ‘ua’, ‘va’, ‘W’, ‘DZ’, ‘T’, ‘delp’, and ‘phis’.
  - “fv_srf_wnd.res” files: These contain 2-D surface winds. Variables: ‘u_srf’ and ‘v_srf’.
  - “fv_tracer.res” files: These contain 3-D fields of the tracers (water species and ozone, primarily) associated with the model physics packages. Variables: ‘sphum’, ‘liq_wat’, ‘rainwat’, ‘ice_wat’, ‘snowwat’, ‘graupel’, ‘sgs_tke’, ‘pbl_clock’, ‘tro_pbl_clock’, ‘o3mr’, and ‘cld_amt’. Note that ‘liq_wat’, ‘ice_wat’, and ‘cld_amt’ are the machine learning targets in the manuscript.
  - “sfc_data” files: These contain 2-D fields associated with the model’s land surface simulation, such as albedo, soil moisture, vegetation, snow, ice, roughness, etc. Variables: ‘slmsk’, ‘tsea’, ‘sheleg’, ‘tg3’, ‘zorl’, ‘alvsf’, ‘alvwf’, ‘alnsf’, ‘alnwf’, ‘facsf’, ‘facwf’, ‘vfrac’, ‘canopy’, ‘f10m’, ‘t2m’, ‘q2m’, ‘vtype’, ‘stype’, ‘uustar’, ‘ffmm’, ‘ffhh’, ‘hice’, ‘fice’, ‘tisfc’, ‘tprcp’, ‘srflag’, ‘snwdph’, ‘shdmin’, ‘shdmax’, ‘slope’, ‘snoalb’, ‘sncovr’, ‘stc’, ‘smc’, and ‘slc’.
A dataset of 2D model diagnostics output at the same time intervals. Diagnostics are different from restart files in that they are outputs of the model, but are not descriptions of the model state needed to initialize a new model simulation. These include the radiative fluxes at earth’s surface and top of atmosphere that are used as evaluation metrics in the manuscript. These diagnostics are saved as a tarballed zarr archive. Zarr is a more convenient form for cloud-native data storage than netCDF files, as it allows for scalable reading and writing of only the portion of a dataset that are needed, and direct read from cloud storage without copying. Note that the zarr format is also self-describing; it contains a file called “.zmetadata” that is a JSON-formatted description of all of the variables included in the zarr store, including their names, units, long-form text descriptions, dimensions, and chunking and encoding into binary format. See https://zarr.dev/ for more information on getting started with zarr.
- The relevant coarsened-fine model radiative flux variables saved here that are used in the manuscript include:
  - ‘USWRFtoa_coarse’: upward total-sky shortwave radiative flux at top of atmosphere
  - ‘ULWRFtoa_coarse’: upward total-sky longwave radiative flux at top of atmosphere
  - ‘DSWRFsfc_coarse’: downward total-sky shortwave radiative flux at the surface
  - ‘DLWRFsfc_coarse’: downward total-sky longwave radiative flux at the surface

Coarsened fine-grid model data for: A machine learning parameterization of clouds in a coarse-resolution climate model for unbiased radiation

Data files

Abstract

README: Coarsened fine-grid model data for “A machine learning parameterization of clouds in a coarse-resolution climate model for unbiased radiation”

Description of the data and file structure

Methods

Works referencing this dataset