Active region magnetograms for solar flare prediction: Extra images dataset
Data files
May 18, 2023 version files 336.48 KB
-
eventList.txt
328.89 KB
-
README.md
7.59 KB
Oct 15, 2023 version files 336.46 KB
-
eventList.txt
328.89 KB
-
README.md
7.57 KB
Abstract
In this dataset, we provide a comprehensive collection of magnetograms from the National Aeronautics and Space Administration's (NASA's) Solar Dynamics Observatory (SDO). The dataset incorporates data from three sources and provides SDO Helioseismic and Magnetic Imager (HMI) magnetograms of solar active regions as well as labels of corresponding flaring activity. This dataset will be useful for image analysis or solar physics research related to magnetic structure, its evolution over time, and its relation to solar flares. The dataset will be of interest to those researchers investigating automated solar flare prediction methods, including supervised and unsupervised machine learning (classical and deep), binary and multi-class classification, and regression. This dataset contains those images that were removed from the preconfigured datasets (see usage notes below).
In this dataset we provide a comprehensive collection of magnetograms from the National Aeronautics and Space Administration’s (NASA’s) Solar Dynamics Observatory (SDO). The dataset incorporates data from three sources and provides SDO Helioseismic and Magnetic Imager (HMI) magnetograms of solar active regions as well as labels of corresponding flaring activity. This dataset will be useful for image analysis or solar physics research related to magnetic structure, its evolution over time, and its relation to solar flares. The dataset will be of interest to those researchers investigating automated solar flare prediction methods, including supervised and unsupervised machine learning (classical and deep), binary and multi-class classification, and regression.
This dataset consists of full resolution images generated from magnetograms of National Oceanic and Atmospheric Administration (NOAA) active regions (ARs) from 01 May 2010 through 31 December 2018 that were excluded from the preconfigured full resolution dataset (see Dryad repository https://doi.org/10.5061/dryad.dv41ns23n). In total, this image dataset contains 421,957 magnetogram images from 1655 ARs. These images are intended to be used in addition to the 950.047 images in the preconfigured full-resolution dataset (https://doi.org/10.5061/dryad.dv41ns23n). Researchers interested in configuring a custom dataset according to criteria of latitude, longitude, NaNs, flare size, and flare window, resulting in classification and regression labels for the dataset may be interested in the code described under “General Code” in the github repository at https://github.com/DuckDuckPig/AR-flares/, zenodo DOI https://zenodo.org/badge/latestdoi/284776348.
Researchers interested in a dataset preconfigured for latitude, longitude, acceptable number of NaNs, flare size, and flare window may be interested in the preconfigured reduced resolution dataset available in the Dryad respository at https://doi.org/10.5061/dryad.jq2bvq898 or the preconfigured full resolution dataset available in the Dryad repository at https://doi.org/10.5061/dryad.dv41ns23n.
This dataset is described in detail in the paper at https://arxiv.org/abs/2305.09492 and is related to the code described in the github repository at https://github.com/DuckDuckPig/AR-flares/, zenodo DOI https://zenodo.org/badge/latestdoi/284776348. The image data and generated labels in this dataset can be useful for classical or deep learning classification or regression problems. For examples of classification tasks, see the github repository at https://github.com/DuckDuckPig/AR-flares/, zenodo DOI https://zenodo.org/badge/latestdoi/284776348 and the paper at https://arxiv.org/abs/2305.09492.
Description of the data and file structure
The image dataset:
The following files comprise the image dataset itself.
- Image files: Images are provided in a directory structure consisting of 1655 directories named
NNNN/
, the four digit NOAA AR number. These files will extract to a directory structureactive_regions_extra/
, where each of the subdirectoriesNNNN/
contains a variable number of.fits
files of the ARNNNN
that satisified the criteria of latitude, longitude, and acceptable number of NaNs. Each.fits
file is prepended with the NOAA AR number for ease of correspondence. There are a total of 421,957 images in the dataset. These images are available on zenodo at the following links:- ARs 1064 through 1527: https://doi.org/10.5281/zenodo.7893645
- ARs 1528 through 1980: https://doi.org/10.5281/zenodo.7900350
- ARs 1981 through 2469: https://doi.org/10.5281/zenodo.7908785
- ARs 2470 through 2731: https://doi.org/10.5281/zenodo.7909013
Researchers wishing to work with the entire dataset must combine the files from the full resolution preconfigured dataset (https://doi.org/10.5061/dryad.dv41ns23n) and this dataset by moving/copying the subdirectories to a common base directory, e.g., active_regions/
Support files:
The following files comprise additional information that are used in configuring a custom dataset according to latitude, longitude, acceptable number of NaNs, flare size, and flare window. See also the code in the github repository at https://github.com/DuckDuckPig/AR-flares/, zenodo DOI https://zenodo.org/badge/latestdoi/284776348.
EventList.txt
: the list of events (flares) occurring within the timespan of the dataset. Each line is of the formatYYYY MM DD,HHMM,NNNN,KX.X
whereYYYY MM DD
is the date,HHMM
is the time,NNNN
is the four-digit NOAA AR number, andKX.X
is the GOES class (e.g.,C1.0
orX10.1
). This file was generated from theEvents/
directory structure for the timespan of the dataset. See notes in the github repository at https://github.com/DuckDuckPig/AR-flares/, zenodo DOI https://zenodo.org/badge/latestdoi/284776348 under “General Code” about use of thisEventList.txt
file versus generating your own from theEvents/
directory structure.
Sharing/access information
This dataset incorporates data from three main sources.
- First, in order to focus the image collection on ARs, we used the NOAA Space Weather Prediction Center (SWPC) Solar Region Summaries (SRS) ftp://ftp.swpc.noaa.gov/pub/warehouse/ and parsed those text data to extract the date an AR appeared on disk and the number of days it was visible on disk. Additionally, the SRS provide latitude and longitude of ARs which we use to postprocess the dataset.
- Second, we downloaded magnetogram images from SDO/HMI using the Joint Science Operations Center (JSOC) interface http://jsoc.stanford.edu/ajax/lookdata.html at a cadence of 720 seconds, centered at the NOAA AR centroid (tracked according to the Carrington rate), and with a spatial extent of 600x600 pixels.
- Third, we used the SWPC Event Reports (ER) ftp://ftp.swpc.noaa.gov/pub/warehouse/ to extract the AR number, peak flare time, and flare size in order to provide labels for those researchers investigating a supervised classification or regression problem.
Code/Software
The code used for the curation of this dataset as well as flare prediction are provided on github at https://github.com/DuckDuckPig/AR-flares/, zenodo DOI https://zenodo.org/badge/latestdoi/284776348. The github repository provides further details on how to run the code.
This dataset incorporates data from three main sources. First, in order to focus the image collection on ARs, we used the NOAA Space Weather Prediction Center (SWPC) Solar Region Summaries (SRS) (ftp://ftp.swpc.noaa.gov/pub/warehouse/) and parsed those text data to extract the date an AR appeared on disk and the number of days it was visible on disk. Additionally, the SRS provide latitude and longitude of ARs which we use to postprocess the dataset. Second, we download magnetogram images from SDO/HMI using the Joint Science Operations Center (JSOC) interface (http://jsoc.stanford.edu/ajax/lookdata.html) at a cadence of 720 seconds, centered at the NOAA AR centroid (tracked according to the Carrington rate), and with a spatial extent of 600x600 pixels. Third, we used the SWPC Event Reports (ER) (ftp://ftp.swpc.noaa.gov/pub/warehouse/) to extract the AR number, peak flare time, and flare size in order to provide labels for those researchers investigating a supervised classification or regression problem.
Image data are provided in supplementary files available on zenodo (see links under Related works) as .fits files which can be opened with with the python package astropy (https://www.astropy.org). All other files included here are text files that can be opened with any standard text manipulation software. We do note, however, that many text files are very large (~1M lines), and may take a while to load.
This is one of three datasets related to the same study:
Reduced resolution dataset: Reduced resolution images (950,047 images, each of which is 224x224 pixels and 8-bit depth resolution), https://doi.org/10.5061/dryad.jq2bvq898.
Full resolution dataset: Full resolution images (950,047 images, each of which is 600x600 pixels and 17-bit depth resolution), https://doi.org/10.5061/dryad.dv41ns23n.
Extra images (this dataset): Images that were excluded from the main analyses in the first and second datasets (421,957 images that were excluded for latitude, longitude, and/or NaN pixels). Researchers wishing to work with the entire dataset (all 1,357,004 images) must combine the files from the full resolution preconfigured dataset (https://doi.org/10.5061/dryad.dv41ns23n) and this extra images dataset by moving/copying the subdirectories to a common base directory, e.g., `active_regions/`.