Skip to main content

Data from: Training data from SPCAM for machine learning in moist physics

Cite this dataset

Zhang, Guang; Han, Yilun; Huang, Xiaomeng; Wang, Yong (2020). Data from: Training data from SPCAM for machine learning in moist physics [Dataset]. Dryad.


Current moist physics parameterization schemes in general circulation models (GCMs) are the main source of biases in simulated precipitation and atmospheric circulation. Recent advances in machine learning make it possible to explore data-driven approaches to developing parameterization for moist physics processes such as convection and clouds. This study aims to develop a new moist physics parameterization scheme based on deep learning. We use a residual convolutional neural network (ResNet) for this purpose. It is trained with one-year simulation from a superparameterized GCM, SPCAM. An independent year of SPCAM simulation is used for evaluation. In the design of the neural network, referred to as ResCu, the moist static energy conservation during moist processes is considered. In addition, the past history of the atmospheric states, convection and clouds are also considered. The predicted variables from the neural network are GCM grid-scale heating and drying rates by convection and clouds, and cloud liquid and ice water contents. Precipitation is derived from predicted moisture tendency. In the independent-data test, ResCu can accurately reproduce the SPCAM simulation in both time-mean and temporal variance. Comparison with other neural networks demonstrates the superior performance of ResNet architecture. ResCu is further tested in a single column model for both continental midlatitude warm season convection and tropical monsoonal convection. In both cases, it simulates the timing and intensity of convective events well. In the prognostic test of tropical convection case, the simulated temperature and moisture biases with ResCu are smaller than those using conventional convection and cloud parameterizations.


This dataset is extracted from a simulation using a Superparameterized GCM, SPCAM ( The SPCAM implements a 2-D CRM in CAM5.2 to replace its conventional parameterization for moist convection and large-scale condensation. The dynamic framework of CAM5 has a horizontal resolution of 1.9x2.5 degrees and 30 vertical levels that are shared with the embedded CRM. The SPCAM used in this study has a coupled land surface model Community Land Model 4.0 (CLM4.0). It uses a prescribed climatological sea surface temperature field that comes with the CAM5 model. It is run for three years and 4 months from Jan. 1st in 1998 to March 31st in 2001 with a time step of 20 minutes. The first year and 4 months are for spin up, the second year is used for training and the third year is used for testing and evaluation. The training data from SPCAM is output every timestep. This dataset contains one year training data and one year evaluation data.

Usage notes

The training samples of the entire year (from yr-2 of simulation) are compressed in SPCAM_ML_Han_et_al_0.tar.gz, and testing samples of the entire year (from yr-3 of simulation) are compressed in SPCAM_ML_Han_et_al_1.tar.gz. In each dataset, there are a data documentation file and 365 netCDF data files (one file for each day) that are marked by its date. The variable fields contain temperature and moisture tendencies and cloud water and cloud ice from the CRM, and vertical profiles of temperature and moisture and large-scale temperature and moisture tendencies from the dynamic core of SPCAM’s host model CAM5 and PBL diffusion. In addition, we include surface sensible and latent heat fluxes. For more details, please read the data documentation inside the tar.gz files. 


US Department of Energy, Office of Science, Biological and Environmental Research Program (BER), Award: DE-SC0019373