Data from: Application of a 1H brain MRS benchmark dataset to deep learning for out-of-voxel artifacts

Stark, Craig 1 ; Gudmundson, Aaron2

Published Mar 05, 2024 on Dryad. https://doi.org/10.7280/D1RX1T

Abstract

Neural networks are potentially valuable for many of the challenges associated with MRS data. The purpose of this manuscript is to describe the AGNOSTIC dataset, which contains 259,200 synthetic 1H MRS examples for training and testing neural networks. AGNOSTIC was created using 270 basis sets that were simulated across 18 field strengths and 15 echo times. The synthetic examples were produced to resemble in vivo brain data with combinations of metabolite, macromolecule, residual water signals, and noise. To demonstrate the utility, we apply AGNOSTIC to train two Convolutional Neural Networks (CNNs) to address out-of-voxel (OOV) echoes. A Detection Network was trained to identify the point-wise presence of OOV echoes, providing proof of concept for real-time detection. A Prediction Network was trained to reconstruct OOV echoes, allowing subtraction during post-processing. Complex OOV signals were mixed into 85% of synthetic examples to train two separate CNNs for the detection and prediction of OOV signals. AGNOSTIC is available through Dryad and all Python 3 code is available through GitHub. The Detection network was shown to perform well, identifying 95% of OOV echoes. Traditional modeling of these detected OOV signals was evaluated and may prove to be an effective method during linear-combination modeling. The Prediction Network greatly reduces OOV echoes within FIDs and achieved a median log10 normed-MSE of –1.79, an improvement of almost two orders of magnitude.

Published in Imaging Neuroscience: Application of a 1H brain MRS benchmark dataset to deep learning for out-of-voxel artifacts

Aaron T. Gudmundson, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0001-5104-0959
Christopher W. Davies-Jenkins, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0002-6015-762X
İpek Özdemir, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0001-6807-9390
Saipavitra Murali-Manohar, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0002-4978-0736
Helge J. Zöllner, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0002-7148-292X
Yulu Song, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0002-4416-7959
Kathleen E. Hupfeld, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0001-5086-4841
Alfons Schnitzler, Heinrich-Heine-University Düsseldorf, ORCID: 0000-0002-6414-7939
Georg Oeltzschner, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0003-3083-9811
Craig E. L. Stark, University of California, Irvine, ORCID: 0000-0002-9334-8502
Richard A. E. Edden, Johns Hopkins School of Medicine, Kennedy Krieger Institute, ORCID: 0000-0002-0671-7374

Publication Information Citation

Gudmundson, A. T., Davies-Jenkins, C. W., Özdemir, İ., Murali-Manohar, S., Zöllner, H. J., Song, Y., ... & Edden, R. A. (2023). Application of a 1H brain MRS benchmark dataset to deep learning for out-of-voxel artifacts. Imaging Neuroscience, 1, 1-15. https://doi.org/10.1162/imag_a_00025

Dataset Description

The Adaptable Generalized Neural-Network Open-source Spectroscopy Training dataset of Individual Components (AGNOSTIC), is a dataset consisting of 259,200 synthetic MRS examples. The synthetic examples contained within the dataset were produced to resemble in vivo brain data with metabolite, macromolecule, residual water signals, and noise. The parameter space that AGNOSTIC spans is wide-reaching, comprising: 18 field strengths; 15 echo times; broad distributions of metabolite, MM, and water amplitudes; and densely sampled time-domain to allow down-sampling.

Abstract

Neural networks are potentially valuable for many of the challenges associated with MRS data. The purpose of this manuscript is to describe the AGNOSTIC dataset, which contains 259,200 synthetic 1H MRS examples for training and testing neural networks. AGNOSTIC was created using 270 basis sets that were simulated across 18 field strengths and 15 echo times. The synthetic examples were produced to resemble in vivo brain data with combinations of metabolite, macromolecule, residual water signals, and noise. To demonstrate the utility, we apply AGNOSTIC to train two Convolutional Neural Networks (CNNs) to address out-of-voxel (OOV) echoes. A Detection Network was trained to identify the point-wise presence of OOV echoes, providing proof of concept for real-time detection. A Prediction Network was trained to reconstruct OOV echoes, allowing subtraction during post-processing. Complex OOV signals were mixed into 85% of synthetic examples to train two separate CNNs for the detection and prediction of OOV signals. AGNOSTIC is available through Dryad, and all Python 3 code is available through GitHub. The Detection network was shown to perform well, identifying 95% of OOV echoes. Traditional modeling of these detected OOV signals was evaluated and may prove to be an effective method during linear-combination modeling. The Prediction Network greatly reduces OOV echoes within FIDs and achieved a median log10 normed-MSE of—1.79, an improvement of almost two orders of magnitude.

Description of the Data and File Structure

The dataset is structured as a zipped NumPy archive file (.npz). The zipped NumPy archive file contains complex-valued NumPy arrays of time-domain (4096 timepoints) data corresponding to the metabolite, macromolecule, water, and noise components which can be combined in different ways depending on the users goal or objective. Within the file, all the acquisition parameters (field strength, echo time, spectral width, etc.), simulation parameters (signal to noise, full-width half-max, concentrations, T2 relaxation, etc.), and data augmentation options are specified.

Sharing/Access Information

Data was derived from the following sources:

https://github.com/agudmundson/agnostic

Code/Software

The following Python 3 scripts, found at https://github.com/agudmundson/agnostic, were used to generate AGNOSTIC:

00_simulation.py (Density Matrix Simulation Functions)
01_deep_sim.py (Acquisition Setting and Metabolite Simulations)
02_metab_matrix_py (Restructuring and Normalizing Basis Set)
03_gen_data.py (Synthetic Dataset)
04_randomize.py (Randomizing Field Strengths and Echo Times)

These scripts primarily rely upon NumPy, SciPy, and standard built-in Python libaries (os, glob, subprocess, etc.)

Dataset Contains:

Column Name	Datatype	Shape	Description
Dataset	String	1	Dataset used in Data Simulations for Concentrations
Field_Str	Array	Batch x 1	Field Strength Used (Tesla)
Echo_Times	Array	Batch x 1	Echo Times Used (ms)
sw	Array	Batch x 1	Spectral Widths Available (Hz) (Dependent on Subsampling and Field Strength)
subsample	Array	Batch x 1	Subsampling Stride Corresponding to SpecWidth
nPoints	Array	Batch x 1	Number of Points (i.e. 512,1024,2048)
Metab	Array	Batch x 4096	Full Metabolite Signal (w/ Concentration, Lorentzian LB, and Gaussian LB)
MM	Array	Batch x 4096	Full Macromolecule Signal (w/ Concentration, Lorentzian LB, and Gaussian LB)
water	Array	Batch x 4096	Full Water Signal (w/ <=5Components, Lorentzian LB, Gaussian LB, Scaling 5x-20x Metabolites)
noise	Array	Batch x 4096	Normal Distributed Noise
time	Array	Batch x 4096	Time Axis (seconds)
ppm	Array	Batch x 4096	Frequency Axis (ppm)
Amplitude	Array	Batch x 182	Concentration Used for Each Spin (Metabolite & MM)
water_pos	Array	Batch x 1	Water is Positive or Negative (0=Pos; 1=Neg)
water_comp	Array	Batch x 5	PPM value of Each Component
waterNcomp	Array	Batch x 5	Components Included
water_amp	Array	Batch x 5	Water Scaling
noise_amp	Array	Batch x 1	Noise Scaling (Equivalent SNR Range Across SpecWidth and Npoints)
freq_shift	Array	Batch x 1	Frequency Shifts (default is not applied)
phase0	Array	Batch x 1	0th Order Phase (default is not applied)
phase1	Array	Batch x 1	1st Order Phase (default is not applied)
phase1_piv	Array	Batch x 1	1st Order Phase Pivot Point
SNR	Array	Batch x 1	SNR (NAA_Amp / StdDev_Noise)
LBL	Array	Batch x 182	Lorentzian Line Broadening Metab/MM
LBG	Array	Batch x 182	Gaussian Line Broadening Metab/MM
m_mult	Array	Batch x 2	Norm Metab (Metab --> 1; Column0 = No Phase & Column2 = with Phase)
w_mult	Array	Batch x 2	Norm Water (Water --> 5x-20x) Correctly Scales Water Relative to Metab
LBL_Water	Array	Batch x 5	Lorentzian Line Broadening Water
LBG_Water	Array	Batch x 5	Gaussian Line Broadening Water
FWHM_MM	Array	Batch x 14	Target FWHM of Macromolecules (14 Macromolecules)
FWHM_Metab	Array	Batch x 182	Target FWHM of Metaboites (FWHM of NAA)
Healthy	Array	Batch x 1	Healthy = 0; Clinical = 1
Clinical	Array	Batch x 1	Healthy = 0; Clinical > 0 (See Clin_Names)
Clin_Names	List	21	Corresponding Names of Population Number from 'Clinical'	Note* 0 is Healthy
Drop_Sig	Array	Batch x 4096	Some/All Metab/MM Signal to be Subtracted (See Batch_Drop and dIdx_Drop)
Batch_Drop	Array	Batch x 1	Randomly Leave Off Some/All Metabolites/Macromolecules - Indicates Which Index was Selected
dIdx_Drop	Array	Batch x 1	Randomly Leave Off Some/All Metabolites/Macromolecules - Index of the 182 Spins to Drop

Funding

This work has been supported by The Henry L. Guenther Foundation, Sonderforschungsbereich (SFB) 974 (TP B07) of the German Research foundation, and the National Institute of Health, grants T32 AG00096, R00 AG062230, R21 EB033516, R01 EB016089, R01 EB023963, K00AG068440, P30 AG066519, R21 AG053040, R01 AG076942, P30 AG066519, and P41 EB031771.

Data from: Application of a 1H brain MRS benchmark dataset to deep learning for out-of-voxel artifacts

Data files

Abstract

README: AGNOSTIC: Adaptable Generalized Neural-Network Open-source Spectroscopy Training dataset of Individual Components

Published in Imaging Neuroscience: Application of a 1H brain MRS benchmark dataset to deep learning for out-of-voxel artifacts

Publication Information Citation

Dataset Description

Abstract

Description of the Data and File Structure

Sharing/Access Information

Code/Software

Dataset Contains:

Funding

Methods

Usage notes

Works referencing this dataset