Skip to main content

Data from: Comparison of rapid biodiversity assessment of meiobenthos using MALDI-TOF MS and metabarcoding

Cite this dataset

Rossel, Sven; Khodami, Sahar; Martínez Arbizu, Pedro (2019). Data from: Comparison of rapid biodiversity assessment of meiobenthos using MALDI-TOF MS and metabarcoding [Dataset]. Dryad.


Nowadays, most biodiversity assessments involving meiofauna are mainly carried out using very time-consuming, specimen-wise morphological identifications, which demands comprehensive taxonomic knowledge. Animals have to be examined for minor differences of setae compositions, mouthpart morphology or number of segments for various extremities. DNA-based methods such as metabarcoding as well as recently emerged rapid analyses using MALDI-TOF mass spectrometry to identify specimens based on a proteome fingerprint could vastly accelerate the process of specimen identification in biodiversity assessments. However, these techniques depend on reference libraries to connect collected data to morphologically described species. In this study the success rate of both approaches have been tested based on reference libraries constructed using part of the samples from a new study area to identify unknown samples. Using MALDI-TOF MS we found, that species which do not exist in an incomplete mass spectra reference library only have minor impact on the results, when employing a post hoc test for Random Forest classifications. This test reveals specimens that demand morphological re-examination for the final species assignment. Metabarcoding however strongly demands a rich reference library to provide correct MOTU assessments in congruence with morphological determination. Nevertheless, with a complete library and a suitable data transformation [herein log(x + 1)], the number of reads per MOTU reflects relative species abundances in metabarcoding inference. The results of this study facilitate specimen identification by using MALDI-TOF MS, which is incomparably cheap for specimen-by specimen identification, but when it comes to sample-wise analyses, metabarcoding outperforms other techniques by far.


Samples were taken by hand with a syringe (Ø 3.1cm, 5cm depth) during low tide at a tidal flat (53°38'40.2"N 8°04'57.6"E) in front of the village Hooksiel in the littoral zone of the German North Sea coast on 19th April 2017. Twelve sandy sediment samples were fixed in absolute ethanol and stored overnight at -25°C.

Samples were sieved through a 40 µm sieve and density-gravity centrifuged according to McIntyre and Warwick (1984) employing Kaolin and Levasil® (Kurt Obermeier GmbH & Co. KG, Bad Berleburg, Germany). Until further processing, samples were stored at -25°C in absolute ethanol.

Animals were incubated for 5 minutes in 2µl of a matrix solution containing α-Cyano-4-hydroxycinnamic acid (HCCA) as a saturated solution in 50% acetonitrile, 47.5% molecular grade water and 2.5% trifluoroacetic acid. Entire solution was applied to target plates for measurements in a Microflex LT/SH System (Bruker Daltonics). Masses were measured from 2k to 20k Dalton. For peak evaluation, mass peak range from 2k to 10k Dalton was analyzed using a centroid peak detection algorithm, a signal to noise threshold of 2 and a minimum intensity threshold of 600, with a peak resolution higher than 400. Proteins/Oligonucleotide method was employed for fuzzy control with a maximal resolution ten times above the threshold. For a sum spectrum, 240 satisfactory shots were summed up. One mass spectrum was measured for each specimen.

Mass spectrometry data was processed together in R (version 3.2.3, R Core Team (2018) using packages ‘MALDIquant’ (Gibb and Strimmer, 2012) and ‘MALDIquantForeign’ (Gibb, 2015). Protein mass spectra were trimmed to an identical range from 2,000 to 20,000 m/z and smoothed with the Savitzky-Golay method (Savitzky and Golay, 1964). The baseline was removed based on SNIP baseline estimation method (Ryan et al., 1988) and spectra were normalized using the TIC method implemented in MALDIquant. Noise estimation was carried out with a signal to noise ratio (SNR) of 7. Peaks were repeatedly binned using command ‘binpeaks’ with a tolerance of 0.002 in a strict approach to the number of peaks for the whole data set was reduced from 9344 to 652 peaks. The resulting intensity matrix was Hellinger transformed (Legendre and Gallagher, 2001).

Both processed data in a Hellinger transformed matrix and raw measurements are stored here at Dryad.



Niedersächsisches Ministerium für Wissenschaft und Kultur, Award: IBR7