Data from: Application of a 1H brain MRS benchmark dataset to deep learning for out-of-voxel artifacts
Data files
Mar 05, 2024 version files 81.56 GB
Abstract
Neural networks are potentially valuable for many of the challenges associated with MRS data. The purpose of this manuscript is to describe the AGNOSTIC dataset, which contains 259,200 synthetic 1H MRS examples for training and testing neural networks. AGNOSTIC was created using 270 basis sets that were simulated across 18 field strengths and 15 echo times. The synthetic examples were produced to resemble in vivo brain data with combinations of metabolite, macromolecule, residual water signals, and noise. To demonstrate the utility, we apply AGNOSTIC to train two Convolutional Neural Networks (CNNs) to address out-of-voxel (OOV) echoes. A Detection Network was trained to identify the point-wise presence of OOV echoes, providing proof of concept for real-time detection. A Prediction Network was trained to reconstruct OOV echoes, allowing subtraction during post-processing. Complex OOV signals were mixed into 85% of synthetic examples to train two separate CNNs for the detection and prediction of OOV signals. AGNOSTIC is available through Dryad and all Python 3 code is available through GitHub. The Detection network was shown to perform well, identifying 95% of OOV echoes. Traditional modeling of these detected OOV signals was evaluated and may prove to be an effective method during linear-combination modeling. The Prediction Network greatly reduces OOV echoes within FIDs and achieved a median log10 normed-MSE of –1.79, an improvement of almost two orders of magnitude.
README: AGNOSTIC: Adaptable Generalized Neural-Network Open-source Spectroscopy Training dataset of Individual Components
Published in Imaging Neuroscience: Application of a 1H brain MRS benchmark dataset to deep learning for out-of-voxel artifacts
- Aaron T. Gudmundson, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0001-5104-0959
- Christopher W. Davies-Jenkins, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0002-6015-762X
- İpek Özdemir, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0001-6807-9390
- Saipavitra Murali-Manohar, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0002-4978-0736
- Helge J. Zöllner, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0002-7148-292X
- Yulu Song, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0002-4416-7959
- Kathleen E. Hupfeld, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0001-5086-4841
- Alfons Schnitzler, Heinrich-Heine-University Düsseldorf, ORCID: 0000-0002-6414-7939
- Georg Oeltzschner, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0003-3083-9811
- Craig E. L. Stark, University of California, Irvine, ORCID: 0000-0002-9334-8502
- Richard A. E. Edden, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0002-0671-7374
Publication Information Citation
Gudmundson, A. T., Davies-Jenkins, C. W., Özdemir, İ., Murali-Manohar, S., Zöllner, H. J., Song, Y., ... & Edden, R. A. (2023). Application of a 1H brain MRS benchmark dataset to deep learning for out-of-voxel artifacts. Imaging Neuroscience, 1, 1-15. https://doi.org/10.1162/imag_a_00025
Dataset Description
The Adaptable Generalized Neural-Network Open-source Spectroscopy Training dataset of Individual Components (AGNOSTIC), is a dataset consisting of 259,200 synthetic MRS examples. The synthetic examples contained within the dataset were produced to resemble in vivo brain data with metabolite, macromolecule, residual water signals, and noise. The parameter space that AGNOSTIC spans is wide-reaching, comprising: 18 field strengths; 15 echo times; broad distributions of metabolite, MM, and water amplitudes; and densely sampled time-domain to allow down-sampling.
Abstract
Neural networks are potentially valuable for many of the challenges associated with MRS data. The purpose of this manuscript is to describe the AGNOSTIC dataset, which contains 259,200 synthetic 1H MRS examples for training and testing neural networks. AGNOSTIC was created using 270 basis sets that were simulated across 18 field strengths and 15 echo times. The synthetic examples were produced to resemble in vivo brain data with combinations of metabolite, macromolecule, residual water signals, and noise. To demonstrate the utility, we apply AGNOSTIC to train two Convolutional Neural Networks (CNNs) to address out-of-voxel (OOV) echoes. A Detection Network was trained to identify the point-wise presence of OOV echoes, providing proof of concept for real-time detection. A Prediction Network was trained to reconstruct OOV echoes, allowing subtraction during post-processing. Complex OOV signals were mixed into 85% of synthetic examples to train two separate CNNs for the detection and prediction of OOV signals. AGNOSTIC is available through Dryad, and all Python 3 code is available through GitHub. The Detection network was shown to perform well, identifying 95% of OOV echoes. Traditional modeling of these detected OOV signals was evaluated and may prove to be an effective method during linear-combination modeling. The Prediction Network greatly reduces OOV echoes within FIDs and achieved a median log10 normed-MSE of—1.79, an improvement of almost two orders of magnitude.
Description of the Data and File Structure
The dataset is structured as a zipped NumPy archive file (.npz). The zipped NumPy archive file contains complex-valued NumPy arrays of time-domain (4096 timepoints) data corresponding to the metabolite, macromolecule, water, and noise components which can be combined in different ways depending on the users goal or objective. Within the file, all the acquisition parameters (field strength, echo time, spectral width, etc.), simulation parameters (signal to noise, full-width half-max, concentrations, T2 relaxation, etc.), and data augmentation options are specified.
Sharing/Access Information
Data was derived from the following sources:
Code/Software
The following Python 3 scripts, found at https://github.com/agudmundson/agnostic, were used to generate AGNOSTIC:
- 00_simulation.py (Density Matrix Simulation Functions)
- 01_deep_sim.py (Acquisition Setting and Metabolite Simulations)
- 02_metab_matrix_py (Restructuring and Normalizing Basis Set)
- 03_gen_data.py (Synthetic Dataset)
- 04_randomize.py (Randomizing Field Strengths and Echo Times)
These scripts primarily rely upon NumPy, SciPy, and standard built-in Python libaries (os, glob, subprocess, etc.)
Dataset Contains:
Column Name | Datatype | Shape | Description | |
---|---|---|---|---|
Dataset | String | 1 | Dataset used in Data Simulations for Concentrations | |
Field_Str | Array | Batch x 1 | Field Strength Used (Tesla) | |
Echo_Times | Array | Batch x 1 | Echo Times Used (ms) | |
sw | Array | Batch x 1 | Spectral Widths Available (Hz) (Dependent on Subsampling and Field Strength) | |
subsample | Array | Batch x 1 | Subsampling Stride Corresponding to SpecWidth | |
nPoints | Array | Batch x 1 | Number of Points (i.e. 512,1024,2048) | |
Metab | Array | Batch x 4096 | Full Metabolite Signal (w/ Concentration, Lorentzian LB, and Gaussian LB) | |
MM | Array | Batch x 4096 | Full Macromolecule Signal (w/ Concentration, Lorentzian LB, and Gaussian LB) | |
water | Array | Batch x 4096 | Full Water Signal (w/ <=5Components, Lorentzian LB, Gaussian LB, Scaling 5x-20x Metabolites) | |
noise | Array | Batch x 4096 | Normal Distributed Noise | |
time | Array | Batch x 4096 | Time Axis (seconds) | |
ppm | Array | Batch x 4096 | Frequency Axis (ppm) | |
Amplitude | Array | Batch x 182 | Concentration Used for Each Spin (Metabolite & MM) | |
water_pos | Array | Batch x 1 | Water is Positive or Negative (0=Pos; 1=Neg) | |
water_comp | Array | Batch x 5 | PPM value of Each Component | |
waterNcomp | Array | Batch x 5 | Components Included | |
water_amp | Array | Batch x 5 | Water Scaling | |
noise_amp | Array | Batch x 1 | Noise Scaling (Equivalent SNR Range Across SpecWidth and Npoints) | |
freq_shift | Array | Batch x 1 | Frequency Shifts (default is not applied) | |
phase0 | Array | Batch x 1 | 0th Order Phase (default is not applied) | |
phase1 | Array | Batch x 1 | 1st Order Phase (default is not applied) | |
phase1_piv | Array | Batch x 1 | 1st Order Phase Pivot Point | |
SNR | Array | Batch x 1 | SNR (NAA_Amp / StdDev_Noise) | |
LBL | Array | Batch x 182 | Lorentzian Line Broadening Metab/MM | |
LBG | Array | Batch x 182 | Gaussian Line Broadening Metab/MM | |
m_mult | Array | Batch x 2 | Norm Metab (Metab --> 1; Column0 = No Phase & Column2 = with Phase) | |
w_mult | Array | Batch x 2 | Norm Water (Water --> 5x-20x) Correctly Scales Water Relative to Metab | |
LBL_Water | Array | Batch x 5 | Lorentzian Line Broadening Water | |
LBG_Water | Array | Batch x 5 | Gaussian Line Broadening Water | |
FWHM_MM | Array | Batch x 14 | Target FWHM of Macromolecules (14 Macromolecules) | |
FWHM_Metab | Array | Batch x 182 | Target FWHM of Metaboites (FWHM of NAA) | |
Healthy | Array | Batch x 1 | Healthy = 0; Clinical = 1 | |
Clinical | Array | Batch x 1 | Healthy = 0; Clinical > 0 (See Clin_Names) | |
Clin_Names | List | 21 | Corresponding Names of Population Number from 'Clinical' | Note* 0 is Healthy |
Drop_Sig | Array | Batch x 4096 | Some/All Metab/MM Signal to be Subtracted (See Batch_Drop and dIdx_Drop) | |
Batch_Drop | Array | Batch x 1 | Randomly Leave Off Some/All Metabolites/Macromolecules - Indicates Which Index was Selected | |
dIdx_Drop | Array | Batch x 1 | Randomly Leave Off Some/All Metabolites/Macromolecules - Index of the 182 Spins to Drop |
Funding
This work has been supported by The Henry L. Guenther Foundation, Sonderforschungsbereich (SFB) 974 (TP B07) of the German Research foundation, and the National Institute of Health, grants T32 AG00096, R00 AG062230, R21 EB033516, R01 EB016089, R01 EB023963, K00AG068440, P30 AG066519, R21 AG053040, R01 AG076942, P30 AG066519, and P41 EB031771.
Methods
AGNOSTIC was created using 270 basis sets that were simulated across 18 field strengths and 15 echo times. The synthetic examples were produced to resemble in vivo brain data with combinations of metabolite, macromolecule, and residual water signals, and noise. All of the parameters (i.e., amplitudes, relaxation decays, etc.) are included in each of the NumPy zipped archive file.
Usage notes
NumPy archive files can be opened using Python and NumPy.