Skip to main content

Data from: Proteomic fingerprinting enables quantitative biodiversity assessments of species and ontogenetic stages in Calanus congeners (Copepoda, Crustacea) from the Arctic Ocean

Cite this dataset

Rossel, Sven et al. (2022). Data from: Proteomic fingerprinting enables quantitative biodiversity assessments of species and ontogenetic stages in Calanus congeners (Copepoda, Crustacea) from the Arctic Ocean [Dataset]. Dryad.


Species identification is pivotal in biodiversity assessments, and proteomic fingerprinting by MALDI-TOF mass spectrometry has already been shown to reliably identify calanoid copepods to species level. However, MALDI-TOF data may contain more information beyond mere species identification. In this study, we investigated different ontogenetic stages (copepodids C1-C6 females) of three co-occurring Calanus species from the Arctic Fram Strait, which cannot be identified to species level based on morphological characters alone. Differentiation of the three species based on mass spectrometry data was without any error. In addition, a clear stage-specific signal was detected in all species, supported by clustering approaches as well as machine learning using Random Forest. More complex mass spectra in later ontogenetic stages as well as relative intensities of certain mass peaks were found as the main drivers of stage distinction in these species. Through a dilution series, we were able to show that this did not result from the higher amount of biomass that was used in tissue processing of the larger stages. Finally, the data were tested in a simulation for application in a real biodiversity assessment by using Random Forest for stage classification of specimens absent from the training data. This resulted in a successful stage-identification rate of almost 90%, making proteomic fingerprinting a promising tool to investigate polewards shifts of Atlantic Calanus species and, in general, to assess stage compositions in biodiversity assessments of Calanoida, which can be notoriously difficult using conventional identification methods.


The anterior prosome body parts of 179 specimens were used for MALDI-TOF MS measurements. Depending on size, these were incubated in 3 – 30 µl (in adults: 5 µl for C. finmarchicus, C. glacialis in 10 µl and C. hyperboreus in 30 µl) of a matrix solution covering the entire specimen with some supernatant. The matrix contained α-Cyano-4-hydroxycinnamic acid (HCCA) as a saturated solution in 50% acetonitrile, 47.5% molecular grade water, and 2.5% trifluoroacetic acid. After 5 min of incubation, 1.5 µl was transferred to a target plate for co-crystallization of matrix and molecules. Subsequently, measurements were carried out using a Microflex LT/SH System (Bruker Daltonics). Employing the flexControl 3.4. (Bruker Daltonics) software, molecule masses were measured from 2 to 20k Dalton (kDA). A centroid peak detection algorithm was carried out for peak evaluation by analyzing the mass peak range from 2 to 20 kDa. Furthermore, peak evaluation was carried out by a signal-to-noise threshold of two and a minimum intensity threshold of 600 with a peak resolution higher than 400. To validate fuzzy control, the proteins/oligonucleotide method was employed by maximal resolution of ten times above the threshold. To create a sum spectrum, a total of 160 laser shots were applied to a spot. Each spot was measured three times.

MALDI-TOF data processing

MALDI-TOF raw data were imported to R, Version 4.1.0 (R-Core-Team, 2022) and processed using R packages MALDIquantForeign, Version 0.12 (Gibb, 2015) and MALDIquant, Version 1.20 (Gibb & Strimmer, 2012). Spectra were square-root transformed, smoothed using the Savitzky Golay method (Savitzky & Golay, 1964), baseline corrected using the SNIP method (Ryan et al., 1988) and spectra normalized using the TIC method. Repeated measurements were averaged by using mean intensities. Peak picking was carried out using a signal to noise ratio (SNR) of 12 and a half window size of 13. Mass peaks smaller than a SNR of 12 were however retained, if they occurred in other mass spectra as long as these were larger than a SNR value of 1.75, which is assumed as a lower detection limit. Repeated peak binning was carried out to align homologous mass peaks. Resulting data was Hellinger transformed (Legendre & Gallagher, 2001) and used for further analyses.

Usage notes

Data was processed in R. Mass spectrometry data can be imported to R using the Package MaldiQuantForeign and further be processed using the package MaldiQuant.

Gibb, S. (2015). MALDIquantForeign: Import/Export routines for MALDIquant. A package for R.

Gibb, S., and Strimmer, Korbinian (2012). MALDIquant: Quantitative Analysis of Mass Spectrometry Data. Bioinformatics 28, 2270--2271. DOI: 10.1093/bioinformatics/bts447.


Alfred Wegener Institute for Polar and Marine Research, Award: AWI_PS121_05

Niedersächsisches Ministerium für Wissenschaft und Kultur, Award: ZN3285

Deutsche Forschungsgemeinschaft, Award: RE2808/3-1

Deutsche Forschungsgemeinschaft, Award: RE2808/3-2