GREMLIN CONUS3 Dataset for 2020
Data files
Apr 04, 2023 version files 392.01 GB
-
202001.tar
36.67 GB
-
202002.tar
30.73 GB
-
202004.tar
35.64 GB
-
202005.tar
37.36 GB
-
202006.tar
36.35 GB
-
202007.tar
36.78 GB
-
202008.tar
36.99 GB
-
202009.tar
35.38 GB
-
202010.tar
35.15 GB
-
202011.tar
34.74 GB
-
202012.tar
36.21 GB
-
README.md
11.96 KB
Abstract
Geostationary Operational Environmental Satellite (GOES) Radar Estimation via Machine Learning to Inform NWP (GREMLIN) is a machine learning model that produces composite radar reflectivity using data from the Advanced Baseline Imager (ABI) and Geostationary Lightning Mapper (GLM). GREMLIN is useful for observing severe weather and providing information during convective initialization especially over regions without ground-based radars. Previous research found good skill compared to ground-based radar products, however, the analysis was done over a dataset with similar climatic and precipitation characteristics as the training dataset: warm season Eastern CONUS in 2019. This study expands the analysis to the entire contiguous United States, during all seasons, and covering the period 2020-2022. Several validation metrics including root-mean-square difference (RMSD), probability of detection (POD), and false alarm ratio (FAR) are plotted over CONUS by season, day-of-year, and time-of-day, and the regional and temporal variations are examined. GREMLIN skill is highest in summer and spring, with lower skill in winter due to cold surfaces frequently mistaken as precipitating clouds. In summer, diurnal patterns of RMSD in different longitude regions follow diurnal patterns of precipitation occurrence. GREMLIN’s accuracy is the best over the Central to Eastern United States where it has been trained. Over New England, GREMLIN POD is lower due to different brightness temperature distributions and low frequency of lightning compared to the training data. Over Florida, GREMLIN FAR is higher due to the high frequency of lightning. Overall, GREMLIN has reliable skill over CONUS in spring, summer, and fall, while winter needs more improvements.
Methods
The methodology is described in detail by Hilburn et al. (2021). The ABI, GLM, and MRMS data sets were resampled to a common 3 km grid. A cloud height of 10 km was used for removing parallax displacements. Satellite and radar samples were matched in time with a maximum time difference of 2.5 minutes. GLM lightning groups were accumulated over 15-minute time periods.
Usage notes
Code for reading the data: The data can be read using the provided Python code ("read_conus3_file.py" and "test_read.py") or the provided Fortran code ("gzmodule.f90", "test_read.f90", "compile.sh"). Note that this code reads the data without having to unzip the files.
The “test_read.py” and “test_read.f90” use one sample file (ABI_C13_202001010000.bin.gz) to verify the code is reading correctly by checking against seven points within the image and against the minimum and maximum over the full image. This file has been included in the software.tar package.
To use the Python code, at the command line simply invoke:
python test_read.py
The function that actually reads a data file, in “read_conus3_file.py”, makes use of the gzip module, which is part of the Python Standard Library. If your code is reading correctly, you should see this output:
testfile= ABI_C13_202001010000.bin.gz
min data, expected, isclose= -1e+30 -1e+30 True
max data, expected, isclose= 296.77463 296.77463 True
data[ 0, 0], expected, isclose= -1e+30 -1e+30 True
data[ 529, 0], expected, isclose= -1e+30 -1e+30 True
data[1058, 0], expected, isclose= -1e+30 -1e+30 True
data[ 0, 899], expected, isclose= 254.55347 254.55347 True
data[ 0,1798], expected, isclose= 295.85233 295.85233 True
data[ 529, 899], expected, isclose= 267.37692 267.37692 True
data[1058,1798], expected, isclose= -1e+30 -1e+30 True
The important thing is that each line ends with “True”, which confirms that the values returned by the code match the expected values to a precision of 1.E-07.
To use the Fortran code, first you must compile the code using “compile.sh”, at the command line:
./compile.sh
Note that this script specifies a typical path to the zlib library on a Linux machine:
/lib64/libz.so.1
If your zlib library is in a different location, you must modify this path.
Once compiled, you can use the Fortran code by invoking at the command line:
./test_read.exe
If your code is reading correctly, you should see the output:
testfile=ABI_C13_202001010000.bin.gz
min gzdata, expected, isclose= -1.00000002E+30 -1.00000002E+30 T
max gzdata, expected, isclose= 296.774628 296.774628 T
gzdata( 1, 1), expected, isclose= -1.00000002E+30 -1.00000002E+30 T
gzdata( 1, 530), expected, isclose= -1.00000002E+30 -1.00000002E+30 T
gzdata( 1,1059), expected, isclose= -1.00000002E+30 -1.00000002E+30 T
gzdata( 900, 1), expected, isclose= 254.553467 254.553467 T
gzdata(1799, 1), expected, isclose= 295.852325 295.852325 T
gzdata( 900, 530), expected, isclose= 267.376923 267.376923 T
gzdata(1799,1059), expected, isclose= -1.00000002E+30 -1.00000002E+30 T
The important thing is that each line ends with “T”, which confirms that the values returned by the code match the expected values to a precision of 1.E-07.
Once you have confirmed that the code is reading correctly for you, then you can use it in your own code.
For Python, this means adding the lines to your code:
from read_conus3_file import read_conus3_file
data = read_conus3_file(testfile)
The data are accessible through the NumPy array “data”.
For Fortran, this means adding the lines to your code:
include 'gzmodule.f90'
use gzmodule, only: read_gzip, gzdata
logical :: success
call read_gzip(testfile,success)
if (.not.success) stop 'error reading testfile'
The data are accessible through the “gzdata” array. Please refer to the “test_read.f90” code for more information about where these statements belong in a Fortran program.