Synthetic urban gamma-ray spectra for training spectral detection and identification models
Data files
May 30, 2023 version files 2.32 GB
-
README.md
-
spectra_train_val_test.h5
Abstract
This dataset contains training, validation, and testing data that consist of individual gamma-ray spectra from a synthetic urban radiological dataset. The spectra were generated by the Radiological Detection and Identification (RADAI) project and are from a simulated 2x4x16" NaI(Tl) detector traveling down a street in a simulated urban area. The background consists of realistic benchmarked terrestrial (K-40, U-238 series, Th-232 series), fallout (Cs-137), rain (Pb-214 and Bi-214), and cosmic gamma-ray events. The 24 anomalous sources are simulated point-like sources of various types, including enhanced naturally occurring radioactive material (NORM), medical isotopes, industrial isotopes, and special nuclear material (SNM). All simulations are performed in 3-D, so the effects of scattering from nearby objects and shielding by clutter are all included. The dataset is prepared so that all sources are encountered at a number of different locations and at a wide variety of strengths.
Methods
The dataset consists of individual gamma-ray spectra generated from a synthetic urban model by the Radiological Detection and Identification (RADAI) project. The spectra are not continuous in time, and some are background-only while others contain a single anomalous source of any of 24 kinds. The data were generated by random selections of spectra from over 100 hours of data. Some high signal-to-noise (SNR) source encounters were in the dataset, and their strengths were randomly downsampled using binomial selection to cover SNR ranges of orders of magnitude for each source. The tools used to generate this dataset from the larger dataset are contained in the RADAI code repository (https://gitlab.com/lbl-anp/radai/radai).
Usage notes
The dataset is stored in a single HDF5 file. Global attributes such as the spectral bin edges, integration time, and spectrum label names are provided as HDF5 datasets and attributes, and the training, validation, and testing data are stored in data groups. Each set contains the spectra (X), the same spectra with background-tagged events only (X_b), the same spectra with source-tagged events only (X_s), the true source labels (y_label), the fractional amount of gross counts from each source type (y_s_over_bs), and the gross-counts signal-to-noise ratio (SNR) of the sources if present (y_snr). The latter two datasets (y_s_over_bs and y_snr) are calculated from the other datasets and provided for convenience.