GOES-16 ABI and collocated SNPP-VIIRS imagery for evaluating the emulation of daytime cloud products at night
Data files
Jun 24, 2025 version files 77.78 GB
-
daytime_eval_data.tar
35.91 GB
-
ground_based_eval_data.tar
16.48 MB
-
models.tar
165.86 MB
-
nighttime_eval_data.tar
17.22 GB
-
README.md
13.55 KB
-
twilight_eval_data.tar
24.47 GB
Abstract
This dataset contains the testing set for Emulating Daytime ABI Cloud Optical Properties at Night with Machine Learning, currently in preparation for peer review. These files contain the information needed for reproducing the analysis and all figures in the manuscript. These files primarily contain Advanced Baseline Imager (ABI) brightness temperatures for 9 channels, cloud properties from the NOAA operational products, cloud optical depth, cloud effective radius, and derived cloud water path, including accompanying estimates from the machine learning emulator. Daytime evaluation files are for testing the effectiveness of a machine learning emulator during the daytime. Twilight files are for illustrating the impact of the day/night terminator on cloud water path distributions. Nighttime files are for evaluating the machine learning emulator in the target domain by comparing to a physical retrieval using lunar reflectance on the Visible Imaging Infrared Radiometer Suite (VIIRS) Day/Night Band (DNB).
https://doi.org/10.5061/dryad.gf1vhhmz6
Description of the data and file structure
There are four sets of .tar data archives within this repository: Daytime, Twilight, Nighttime, and ground-based evaluation data. A fifth .tar file contains files needed for running the four machine learning emulators.
The first three files primarily contain GOES-16 Advanced Baseline Imager (ABI) brightness temperatures for 9 channels, cloud properties from the NOAA operational products, including cloud optical depth, cloud effective radius, and derived cloud water path. They also include accompanying estimates from a machine learning emulator. Daytime evaluation files are used for testing the effectiveness of the machine learning emulator during the daytime. Twilight files are used for illustrating the impact of the day/night terminator on cloud water path distributions. Nighttime files are used for evaluating the machine learning emulator in the target domain by comparing to a physical retrieval using lunar reflectance on the Visible Imaging Infrared Radiometer Suite (VIIRS) Day/Night Band (DNB).
The ground-based file contains pixel matchups between ABI and the Atmospheric Radiation Measurement (ARM) Southern Great Plains (SGP) observations. This file contains information to perform a more independent assessment of DCOMP, NCOMP, and the neural network model. Due to the license accompanying the ARM-SGP data, these products are not included here. Instead, we provide the exact times that these collocations occur at the SGP site and provide instructions on how to preprocess the data to be used in the accompanying software archive.
The models.tar file contains the weights, configuration, and preprocessing statistics required for running the models using TensorFlow 2.15. Example code of how to run these models is present in the accompanying Zenodo software archive.
Files and variables
For the daytime, twilight, and nighttime datasets, all data are contained in HDF5 files and can be read using the h5py Python library (https://www.h5py.org/) or any other HDF5 reader. All daytime files represent a 1024 x 1024 pixel cut-out from GOES-16 ABI full-disk imagery. All twilight and nighttime files are 2-km resolution full-disk images from ABI (5424 x 5424 pixels).
These files can be read in Python using the h5py library, like this example for reading channel 13 on ABI:
import h5py
file_obj = h5py.file(path_to_h5_file, 'r') # Set up the file object
abi_channel_13 = file_obj['C13'][:] # Read C13 from the file
file_obj.close() # Close the file object
All files are named according to the timestamp in the corresponding ABI level1b file following a YYYYDDDHHMMSSs convention. YYYY=4-digit year, DDD=3-digit Julian day, HH=2-digit hour, SSs=3-digit second (e.g. 205= 20.5 seconds).
In the Daytime Evaluation files:
C03: ABI Level-1b visible reflectance (%) for channel 3 (0.86 µm).
C08-16: ABI Level-1b brightness temperatures (K) corresponding to the numbered ABI bands (6.2 µm - 13.3 µm)
clavrx_cld_cwp_dcomp: Cloud water path from DCOMP in gm^{-2} implemented within CLAVR-x
clavrx_cld_opd_dcomp: Cloud optical depth from DCOMP (unitless) implemented within CLAVR-x
clavrx_cld_reff_dcomp: Cloud effective radius from DCOMP (µm) implemented within CLAVR-x
cld_opd_dcomp_unc: Uncertainty in estimated cloud optical depth from DCOMP (%).
model7.[...]_ftd[...]_dcomp: Estimates of DCOMP cloud optical properties obtained from four different neural networks. In the associated paper, MLP64=model7.1, MLP256=model7.2, U-NetCOMP24=model7.3, and U-NetCOMP48=model7.4. The MLP models use only spectral information within a given pixel, where 64 and 256 indicate the number of units in each layer of the neural network. The U-NetCOMP models are able to exploit spatial context, where 24 and 48 indicate the number of convolutional filters in the first layer of the U-Net model.
clavrx_cloud_mask: A binary indicator of cloud (1) or clear (0) from CLAVR-x
clavrx_cloud_phase: Cloud-top phase indicator where 0=clear, 1=water, 2=supercooled water, 3=mixed phase, 4=ice, 5=unknown
cld_height_acha, cld_opd_acha, cld_reff_acha: Cloud-top height (km), optical depth (unitless), and effective radius (µm) derived from the Algorithm Working Group (AWG) Cloud Height Algorithm (ACHA) implemented within CLAVR-x
cloud_type: Integer classification of the cloud type, including clear and aerosol type,0=clear,1=probably clear,2=fog,3=water,4=supercooled water, 5=mixed, 6=opaque_ice, 7=cirrus, 8=overlapping, 9=overshooting, 10=unknown, 11=dust, 12=smoke, 13=fire.
custom_dcomp_quality_flag: A binary value indicating where a given pixel satisfies the requirements for use as a label during model training. A value of 1 indicates a confident cloudy pixel free of sun glint with valid effective radius and optical depth estimates, solar zenith angle less than 55 degrees, and sensor zenith angle less than 65 degrees.
dcomp_quality: Quality flags for DCOMP products (see CLAVR-x documentation)
glint_mask: A binary indicator of sun glint
land_class: shallow ocean=0, land=1, coastline=2, shallow inland water=3, ephemeral water=4, deep inland water=5, moderate ocean=6, deep ocean=7
latitude, longitude: coordinates assigned to the centers of GOES-16 ABI pixels. Expressed in degrees N and degrees E.
sensor_zenith_angle: Angle in degrees between the path ABI saw through the atmosphere and the surface normal (0 degrees indicates the satellite is directly overhead).
snow_class: no snow/ice=1, sea_ice=2, snow=3
In the Twilight Evaluation files:
L2_ACM: Four-level NOAA operational cloud mask for ABI. 0: confident clear, 1:probably clear, 2: probably cloudy, 3: confident cloudy.
L2_COD: NOAA operational cloud optical depth (unitless) for ABI.
L2_CPS: NOAA operational cloud particle size (effective radius; µm) for ABI
L2_Phase: NOAA operational cloud phase for ABI. 0: clear-sky, 1: liquid water, 2: supercooled liquid water, 3: mixed phase, 4: ice, 5: unknown
model7.4_ftd_cld_opd, model7.4_ftd_cld_reff: U-NetCOMP48 estimates of optical depth (unitless) and effective radius (µm).
The twilight evaluation files also include a small number of the fields mentioned above in the daytime evaluation files.
In the Nighttime Evaluation files:
scan_line_time: UTC time of ABI observation in decimal hours (20.5 is 20:30 UTC)
viirs_scan_line_time: UTC time of VIIRS observation in decimal hours.
viirs_cld_opd_nlcomp: Cloud optical depth estimated from the Nighttime Lundar Cloud Optical and Microphysical Properties (NLCOMP) algorithm for SNPP-VIIRS
viirs_latitude, viirs_longitude: Latitude and longitude of SNPP VIIRS observations
viirs_dist: distance calculated in degrees between ABI and VIIRS observations. Used for nearest neighbor resampling.
viirs_refl_lunar_dnb_nom: Day/Night Band lunar reflectance (%) on VIIRS.
viirs_solar_azimuth_angle, viirs_solar_zenith_angle: Similar to solar geometry for ABI included in the daytime file, but for SNPP-VIIRS
viirs_lunar_azimuth_angle, viirs_lunar_zenith_angle: Lunar geometry used to calculate lunar glint angle. Similar to solar geometry used during the day.
The nighttime evaluation files also include several fields mentioned above in the daytime and twilight evaluation files with the same meaning.
For many of the above-mentioned fields, more details can be found in the CLAVR-x documentation (https://cimss.ssec.wisc.edu/clavrx/documentation/).
The ground-based evaluation file:
This is a .csv file with rows indicating individual matchups between ABI and the ARM-SGP instrumentation. Measurements from the ARM-SGP site are derived from the MICROBASE products (https://www.arm.gov/capabilities/science-data-products/vaps/microbase) and the ARSCL products ([https://www.arm.gov/capabilities/science-data-products/vaps/arscl])(https://www.arm.gov/capabilities/science-data-products/vaps/arscl). This dataset is filtered for single-layer non-precipitating clouds with at least 1 g/m2 cloud water path. Remaining samples are further filtered to ensure that these requirements are met for all profiles within 20 seconds before and after a given profile.
Due to the license accompanying DOE ARM data, we are not able to include the MICROBASE and ARSCL products here. Instead, we leave the quantities derived from DOE ARM data blank in this .csv and provide the time of observation for all collocations. Thus, this file only includes estimates from DCOMP, NCOMP, and our neural network emulator. In order to reproduce the analysis, users will need to download the full year (2021) of products from the above links to the MICROBASE and ARSCL products for the SGP site and select profiles based on the times included in this file.
C08-C16: Collocated ABI brightness temperatures for channels 8 through 16
ABI-L2-CODC, ABI-L2-CPSC, ABI-L2-ACMC, ABI-L2-ACTPC, ABI-L2-ACHAC, ABI-L2-Derived-CWP: Cloud optical depth, cloud particle size, cloud mask, cloud-top phase, cloud-top height, and derived cloud water path from the NOAA operational products for the collocated pixel.
nn_cod, nn_cer, nn_cwp: Neural network (U-NetCOMP48) estimates of cloud optical depth, cloud effective radius, and derived cloud water path.
sensor_zenith_angle, sensor_azimuth_angle, solar_zenith_angle: Viewing and solar geometry similar to above.
latitude, longitude, latitude_pc, longitude_pc: Geographic coordinates and their parallax corrections using cloud-top height.
satellite_time: UTC time of satellite observation.
cloud_fraction_5x5, cloud_fraction_7x7, cloud_fraction13x13: Cloud fraction in the surrounding X by X pixel area derived from ABI
cth_std_13x13 cth_std_7x7 cth_std_5x5: Standard deviation of cloud-top heights (km) in the surrounding X by X pixel area derived from ABI
toa_dwsrf: Downwelling solar radiative forcing at the top of the atmosphere in watts per square meter.
time_diff_minutes: time difference in minutes between the satellite and ground-based measurements
ground_time: Time that the ground-based observation occurs. This is used to find the corresponding observation from the MICROBASE/ARSCL products.
The following are left blank and need to be calculated from the MICROBASE/ARSCL products. An example script for preprocessing these data is available in the accompanying Zenodo repository.
kazr_cth, kazr_cbh, kazr_cgt: Cloud-top height, cloud-base height, and cloud geometric thickness from the ARSCL product
cwp_layer1, iwp_layer1, lwp_layer1: Cloud water path, liquid water path, and ice water path for the first and only cloud layer in a given profile
kazr_mwr_lwp: Liquid water path estimated from microwave radiometer
kazr_precip_mean: Precipitation mean from rain gauge
kazr_instrument_flag: Indicates availability of each instrument (see metadata at https://www.arm.gov/capabilities/science-data-products/vaps/arscl for describtion of values)
kazr_all_instruments_present: Indicates whether all instruments are present for a given profile
Code/software
All scripts used in this dataset are provided here in the form of Jupyter notebooks. We recommend viewing the code in Jupyter Lab (https://jupyter.org/install), but other notebook viewers should work just as well.
The model weights included in the models.tar were developed using TensorFlow 2.15.0. The model output is already saved to each file, so running these models is not necessary to reproduce the analysis performed in the associated paper. We include a Jupyter notebook 'Inference_Example_Final.ipynb' that includes an example of how to run these models if desired.
The analysis and figures in the paper can be reproduced using the 'Daytime Eval Final.ipynb', 'Twilight Eval Final.ipynb', 'Night Eval Final.ipynb', and 'Ground_Based_Final.ipynb' notebooks are contained in the scripts.tar file. Prior to running the code in 'Ground_Based_Final.ipynb', users will need to download the MICROBASE and ARSCL products for 2021 and preprocess them following the example in 'Ground_Based_Preprocess.ipynb'
These notebooks are written in Python 3.11 and have dependencies on scipy, h5py, numpy, pandas, matplotlib, joblib, scikit-learn, and cartopy. Specific versions used are below.
scipy 1.13.0
h5py 3.11.0
numpy 1.26.4
pandas 2.2.1
matplotlib 3.8.4
joblib 1.4.0
scikit-learn 1.5.0
cartopy 0.23.0
Access information
Data was derived from the following sources:
- ABI and VIIRS data obtained from NOAA CLASS (https://www.aev.class.noaa.gov/)
- ABI and VIIRS data were processed using CLAVR-x (https://cimss.ssec.wisc.edu/clavrx/documentation/)
- In order to reproduce the ground-based analysis, the required ARM-SGP data can be downloaded from the MICROBASE products (https://www.arm.gov/capabilities/science-data-products/vaps/microbase) and the ARSCL products (https://www.arm.gov/capabilities/science-data-products/vaps/arscl).
All ABI brightness temperatures are obtained from the NOAA Comprehensive Large Array Data Stewardship System (CLASS; www.aev.class.noaa.gov). For the daytime dataset, cloud properties are obtained using the Clouds for AVHRR Extended (CLAVR-x; https://cimss.ssec.wisc.edu/clavrx/documentation/) software. For the twilight dataset, NOAA operational cloud properties, daytime and nighttime cloud properties, are obtained from CLASS. For the nighttime dataset, Suomi-NPP VIIRS data is obtained from CLASS, processed through CLAVR-x, and collocated to ABI imagery using nearest-neighbor resampling.
In this dataset, we define daytime observations as having a solar zenith angle less than 82 degrees. Twilight observations have a solar zenith angle greater than 82 degrees and less than 90 degrees. Nighttime observations have a solar zenith angle greater than 90 degrees. Machine learning emulators are trained to match the Daytime Cloud Optical and Microphysical Properties (DCOMP) algorithm during the daytime and are applied in twilight and nighttime scenes to evaluate their effectiveness when visible and near-infrared reflectances are not available.
During the daytime, the machine learning emulators are compared to DCOMP cloud property estimates.
