Skip to main content

MALDI-MS dataset for use with open-source untargeted metabolomic workflow for complex biological samples


Walker, Heather (2023), MALDI-MS dataset for use with open-source untargeted metabolomic workflow for complex biological samples, Dryad, Dataset,


Untargeted metabolomics is a powerful tool for measuring and understanding complex biological chemistries. However, employment, bioinformatics and downstream analysis of mass spectrometry (MS) data can be daunting for inexperienced users. Numerous open-source and free to-use data processing and analysis tools exist for various untargeted MS approaches, but choosing the ‘correct’ pipeline isn’t straight-forward. This data set can be used in conjunction with a user-friendly online guide which presents a workflow for connecting these tools to process, analyse and annotate various untargeted MS datasets. The workflow is intended to guide exploratory analysis in order to inform decision-making regarding costly and time-consuming downstream targeted MS approaches. The workflow provides practical advice concerning experimental design, organisation of data and downstream analysis, and offers details on sharing and storing valuable MS data for posterity. The workflow is editable and modular, allowing flexibility for updated/ changing methodologies and increased clarity and detail as user participation becomes more common allowing contributions and improvements to the workflow via the online repository. 


The dataset was collected from soil analysis by MALDI-TOF-MS. Three types of soil were collected from the field – arable, orchard and forest soil. The samples were extracted into chloroform, methanol and water and the aqueous fraction (methanol:water) was mixed 1:1 with 5mg/ml CHCA and 1ul was spotted onto a MALDI target plate. Each sample was analysed using MALDI-TOF-MS over a scan range of 50-800m/z with a 1-minute scan time and data was collected for 1min. Each sample was run 3 times for technical replication and 3 biological replicates per soil type were run.

Usage notes

The raw data files are available to download but need to be converted using Proteowizard to the universal mzML format. Proteowizard is an open-source tool for Windows users.

The mzML files are also available to download which can be used directly with the processing workflow.

The workflow for processing the untargeted metabolomics data can be accessed from the following link. The workflow is based on open-source tools.



Biotechnology and Biological Sciences Research Council, Award: BB/M011151/1

Biotechnology and Biological Sciences Research Council, Award: BB/T010789/1