Data from: SLICE-MSI: A machine learning interface for system suitability testing of mass spectrometry imaging platforms
Data files
Jan 15, 2025 version files 766.31 KB
-
QC_Testing.csv
6.17 KB
-
QC_Training.csv
757.10 KB
-
README.md
3.03 KB
Abstract
The field of mass spectrometry imaging is currently devoid of standardized protocols or commercially available products designed for system suitability testing of MSI platforms. Machine learning is an approach that can quickly and effectively identify complex patterns in data and use them to make informed classifications, but there is a technical barrier to implementing these algorithms. Here we package the machine learning algorithms into a user-friendly interface to make community-wide implementation of this protocol possible. The software package is built entirely in the Python language using the PySimpleGUI library for the construction of the interface, Pandas and Numpy libraries for data formatting and manipulation, and the Scikit-Learn library for the implementation of machine learning algorithms. Training data is collected on a clean and compromised instrument that can then be used to evaluate model performance and to train models prior to interrogating unknown samples before, during, or after experiments.
README: SLICE-MSI Executable and Example Data
https://doi.org/10.5061/dryad.msbcc2g7c
Description of the data and file structure
The collected data comes from a novel QC mix detected on a clean and compromised IR-MALDESI-MSI platform. The corresponding software package is a graphical user interface that incorporates machine learning algorithms for efficient and effective classification of instrument condition. This work was completed to fill a current void in the MSI community and provide an easy-to-use and easily implementable quality control and system suitability testing protocol for MSI.
Files and variables
File: QC_Testing.csv
Description: CSV containing one replicate from the complete dataset to act as a testing set to be used alongside the user manual. Any missing values present are due to the lack of detection of the analyte in that scan. For example, if the analyte is not detected in the ROI the abundance cell will be blank, or if the isotopic peak is not detected the isotope count cannot be calculated so both cells will be blank. Missing values are handled within the SLICE-MSI software.
File: QC_Training.csv
Description: Complete training dataset containing all clean and compromised replicates with the corresponding QC data collected. This dataset is the training data that can be used alongside the user manual for instruction. Any missing values present are due to the lack of detection of the analyte in that scan. For example, if the analyte is not detected in the ROI the abundance cell will be blank, or if the isotopic peak is not detected the isotope count cannot be calculated so both cells will be blank. Missing values are handled within the SLICE-MSI software.
Variables
Analytes: Caffeine, Sulfamethazine, L-Thyroxine, PEG 3-21, and isotope peaks for each (44 in total)
- Condition: Class labels of the data, either "Clean" or "Compromised"
- [Analyte] Abundance: Absolute abundance for the corresponding analyte, included for M+1 isotope also (ions/second)
- [Analyte] MMA: Mass measurement accuracy of the corresponding analyte, included for M+1 isotope also (ppm)
- [Analyte] RSD: Relative standard deviation of the abundance values for the corresponding analyte, included for M+1 isotope also (unitless)
- [Analyte] Detection Frequency: Detection frequency of the analyte within the region of interest, included for M+1 isotope also (unitless proportion)
- [Analyte] Isotope Count: Difference of predicted carbons from spectra and true, expected carbons for spectral accuracy measurement of the corresponding analyte (carbons)
- PDI: The polydispersity index calculated per ROI using the PEG peaks (unitless)
Code/software
SLICE-MSI is a fully compiled executable and does not need any outside packages or libraries installed to run. SLICE-MSI is a GUI that takes QC data collected from MSI platforms and, using machine learning algorithms, uses that data to classify instrument condition.