Skip to main content

MALDI-TOF MS data: Species delimitation of Hexacorallia and Octocorallia around Iceland using nuclear and mitochondrial DNA and proteome fingerprinting

Cite this dataset

Korfhage, Severin A. et al. (2022). MALDI-TOF MS data: Species delimitation of Hexacorallia and Octocorallia around Iceland using nuclear and mitochondrial DNA and proteome fingerprinting [Dataset]. Dryad.


Cold-water corals build up reef structures or coral gardens and play an important role for many organisms in the deep sea. Climate change, deep-sea mining, and bottom trawling are severely compromising these ecosystems, making it all the more important to document the diversity, distribution, and impacts on corals. This goes hand in hand with species identification, which is morphologically and genetically challenging for Hexa- and Octocorallia. Morphological variation and slowly evolving molecular markers both contribute to the difficulty of species identification. In this study, a fast and cheap species delimitation tool for Octocorallia and Scleractinia of the Northeast Atlantic was tested based on 49 specimens. Two nuclear markers (ITS2 and 28S rDNA) and two mitochondrial markers (COI and mtMutS) were sequenced. The sequences formed the basis of a reference library for comparison to the results of species delimitation based on proteomic analysis using the MALDI-TOF MS method. The genetic methods were able to distinguish 17 of 18 presumed species. The MALDI-TOF MS method was able to distinguish 7 species. Species that could not be distinguished from one another still achieved good signals but were not represented by enough specimens for comparison. Therefore, it is predicted that with an extensive reference library of proteome spectra for Scleractinia and Octocorallia, MALDI-TOF MS may provide a rapid and cost-effective alternative for species discrimination in corals.



The specimens were collected by the ROV Kiel 6000 between 19th June and 27th July 2020 in Icelandic waters during the IceAGE 3 cruise SO276/MerMet17-6 (with RV SONNE) and by the ROV PHOCA during the IceAGE RR cruise MSM75 between 29th June and 8th August 2018 (RV MS MERIAN). Stations were located around Iceland, in the Norwegian Basin, the Norwegian Sea, and the Reykjanes Ridge in 207.1 m to 2,040.8 m depth. The collected corals were photographed with an HD camera system of the ROV Kiel 6000 in the habitat and with a DLSR camera system (Canon EOS 5D Mark IV with Canon MP-E 65mm f/2.8 1-5x Macro Photo lens and Canon Compact-Macro Lens EF 50mm 1:25) on board. Samples were preserved in 96% undenatured ethanol which was changed after 24h on board. Larger samples were preserved in 3% formol solution where subsamples were taken and preserved in 96% undenatured ethanol. Samples were stored at -20 °C at the German Centre for Marine Biodiversity Research (DZMB) in Hamburg, Germany.


MALDI-TOF MS analysis

In total, one square millimeter of tissue of 49 ethanol-preserved individuals were separated into 1.5 ml microcentrifuge tubes. After ethanol evaporation, 1.5 µl of a matrix solution containing α-Cyano-4-hydroxycinnamic acid (HCCA) as a saturated solution in 50% acetonitrile, 47.5% molecular grade water, and 2.5% trifluoroacetic acid was added. The solution was incubated for 5 to 90 min and was applied to one spot for crystallization on the target plate. The Microflex LT/SH System (Bruker Daltonics) measured the samples by using the flexControl 3.4. (Bruker Daltonics) software. Masses were measured from 2 to 20k Dalton. A centroid peak detection algorithm was carried out for peak evaluation by analyzing the mass peak range from 2 to 20k Dalton. Furthermore, peak evaluation was carried out by a signal-to-noise threshold of two and a minimum intensity threshold of 600 with a peak resolution higher than 400. To validate fuzzy control, the proteins/oligonucleotide method was employed by maximal resolution of ten times above the threshold. The obtained dataset was analyzed as described by (Rossel and Martínez Arbizu, 2018a)in R, version 1.4.1106 (R Core Team, 2020) using the packages MALDIquant (Gibb, 2012)and MALDIquantForeign (Gibb, 2019). Protein mass spectra were trimmed to an identical range from 2,000 to 20,000 m/z and smoothed by using the Saviztky-Golay method (Savitzky and Golay, 1964) with half window size (HWS) of 10. The SNIP baseline estimation method (Ryan et al., 1988) was applied to remove the baseline, and the TIC method in MALDIquant was used to normalize the spectra. A signal-to-noise ratio (SNR) of 5 was applied to reduce the noise of the spectra, and a half window size of 10 was used for peak detection. The peaks of the spectra were binned several times by using the function binpeaks in MALDIquant with a tolerance of 0.002 in a strict approach. To apply further analysis, a Hellinger transformation (Legendre and Gallagher, 2001) was applied to the resulting intensity matrix. A dendrogram was generated by hierarchical cluster analysis with Ward’s D clustering algorithm (Ward and Joe, 1963), Euclidean distances, and 1,000 bootstrap repeats.

Furthermore, a RandomForest (RF) model (Breiman, 2001) using R-package randomForest (Liaw and Wiener, 2002) was generated to investigate applicability of mass spectra in classification approaches. The RF analysis is based on an intensity matrix by using bins as predictors and species names as multi-level target factors. The RF analysis was carried out on Hellinger transformed data (Legendre and Gallagher, 2001) using 35 predictors (mtry) and 2,000 trees. A t-SNE plot (Van der Maaten and Hinton, 2008), based on the raw data matrix probability of each specimen was applied. Here, the R-package t-SNE (Krijthe and Van der Maaten, 2015) was used. The t-SNE plot was constructed by using a perplexity of 10 and a number of iterations of 4,000.


Bundesministerium für Bildung und Forschung, Deutsche Forschungsgemeinschaft, Award: MerMet17-15