Data from: Potential of MALDI−TOF MS-based proteomic fingerprinting for species identification of Cnidaria across classes, species, regions and developmental stages
Data files
Jun 30, 2023 version files 246.89 MB
-
01_Species_clustering.R
6.73 KB
-
02_Ontogeny.R
9.18 KB
-
03_Stage_classification.R
15.57 KB
-
Mass_Spectra.zip
246.77 MB
-
Meta_Data_for_submission.csv
39.07 KB
-
README.md
5.66 KB
-
RefTable.csv
50.20 KB
Jun 30, 2023 version files 246.89 MB
-
01_Species_clustering.R
6.73 KB
-
02_Ontogeny.R
9.18 KB
-
03_Stage_classification.R
15.57 KB
-
Mass_Spectra.zip
246.77 MB
-
Meta_Data_for_submission.csv
39.07 KB
-
README.md
5.51 KB
-
RefTable.csv
50.20 KB
Abstract
Morphological identification of cnidarian species can be difficult throughout all life stages due to the lack of distinct morphological characters. Moreover, in some cnidarian taxa genetic markers are not fully informative, and in these cases combinations of different markers or additional morphological verifications may be required. Proteomic fingerprinting based on MALDI-TOF mass spectra was previously shown to provide reliable species identification in different metazoans including some cnidarian taxa. For the first time, we tested the method across four cnidarian classes (Staurozoa, Scyphozoa, Anthozoa, Hydrozoa) and included different scyphozoan life-history stages (polyp, ephyra, medusa) into our dataset. Our results revealed reliable species identification based on MALDI-TOF mass spectra across all taxa with species-specific clusters for all 23 analyzed species. In addition, proteomic fingerprinting was successful for distinguishing developmental stages, still by retaining a species specific signal. Furthermore, we identified the impact of different salinities in different regions (North Sea and Baltic Sea) on proteomic fingerprints to be negligible. In conclusion, the effects of environmental factors and developmental stages on proteomic fingerprints seem to be low in cnidarians. This would allow using reference libraries built up entirely of adult or cultured cnidarian specimens for the identification of their juvenile stages or specimens from different geographic regions in future biodiversity assessment studies.
README
Readme file for data uploaded to data dryad project: Data from: Potential of MALDITOF MS-based proteomic
fingerprinting for species identification of Cnidaria across classes, species, regions and developmental stages
Potential of MALDITOF MS-based proteomic fingerprinting for species identification of Cnidaria across classes,
species, regions and developmental stages in Molecular Ecology Resources
DOI:ADD UPON ACCEPTANCE
This dataset contains a .csv file called RefTable.csv and a .zip file containing raw
Bruker mass spectrometry files for 562 measurements from a variety of Cnidaria species.
These folder are build up according to the default Bruker flex container system
and contain one folder for each spot that was measured for a specimen. Each "spot folder" consists of folders for each
technical replicate measurement on the respective spot. Within these folders the actual data produced by the instrument can
be found. For each specimen, one to three folders can be found here.
The RefTable.csv file contains information on which folders belong to which specimen and species.
Data in this submission:
|- README.md #Readme file containing information on included data and how to process these
|- 01_Species_clustering.R #R script to replicate results from the publication (Species clustering)
|- 02_Ontogeny.R #R R script to replicate results from the publication (tSNE Plots, important peaks for differentiation)
|- 03_Stage_classification.R #R script to replicate results from the publication (Random Forest model for classification of specimens to species level based on a reference library not including the respective ontogenetic stage)
|- Meta_Data_for_submission.csv #Meta information on spcimens processed in this workt including sampling information as well as taxonomic information
-Specimen: The name with which the specimen was measured
-Specimen number: A specimen number to combine measurements from different specimens
-Class: Class of the measured specimen
-Species: Species name of that measurement
-Stage: Developmental stage of the specimen the measurement was obtained from
-Region: NS = North Sea; BS = Baltic Sea
-Locality: A locality name where a specimen was obtained. Cruise: Specimen was taken during a cruise with more information in the sbsequent columns
-Cruise: Cruise number during which the specimen was sampled. WH III = Walther Herwig III, Senckenberg = Cruise with research vessel Senckeneberg
-Station: Station during that cruise
-Coordinates: Coordinates of the respective sampling site
-Sample or culture: If specimen was obrtained from a culture or sampled. S = sampled; C = cultured
-Sampling date: Date of the sampling occasion
-Fixation period: duration of fixation period in months
-Fixation period2: duration of fixation period in months grouped in 3 groups by duration
-Replicates: Number of measurement replicates for the respective specimen
-Number of peaks: Amount of peaks for the respective specimen
|- RefTable.csv #Reference table for processing the data in R. Importing and application is included in all R scripts
-id: Name of that measurement
-species: Species name of that measurement
-specimen: A number to combine different measurements of a single specimen
-region: NS = North Sea; BS = Baltic Sea
-stage: Developmental stage of the specimen the measurement was obtained from
-origin: S = sampled, C= cultured
-fixper: duration of fixation period in months
-phylum: Phylum of the measured specimen
-class: Class of the measured specimen
-fixper2: duration of fixation period in months grouped in 3 groups by duration
|- Mass_Spectra.zip #Contains raw data in the Bruker file container format. Each folder within this is a measurement. Measurements from the same specimen are combined using the reference table in R
-Mass spectra were obtained from different measurement occassions and thus have different names withouth a pattern applying to all of them.
All relevant information about the measurements can be found in the file Meta_Data_for_submission.csv.
Data can be analyzed using Bruker proprietary software such as Bruker Flex analysis or Bruker Biotyper.
Alternatively, data can be analyzed in R using R-packages MaldiQuant and MaldiQuantForeign following
the vignette that can be found via:
https://cran.r-project.org/web/packages/MALDIquant/index.html
To get an initial impression of the data in R, data can be imported following these commands:
install.packages("MALDIquantForeign")
install.packages("MALDIquant")
library("MALDIquantForeign")
library("MALDIquant")
setwd("set working directory to Mass spectra location") #set wd to the folder in which the folder containing the mass spectra is located in
rawSpectra <- importBrukerFlex("Mass_Spectra",removeEmptySpectra=TRUE)#import raw spectra
This imports all data to R. I you want to look at them, you can either do:
plot(rawSpectra[[1]]) #view the first mass spectrum. Replace the 1 by any number you would like to view.
or you can plot all mass spectra into a pdf:
names<- sapply(rawSpectra, function(x)metaData(x)$id) #get measurement IDs from the metaData
#Export raw Spectra to PDF
pdf(file= "01_raw_spectra.pdf",
height=14, width=24)
for (i in seq(along=rawSpectra)) {
plot(rawSpectra[[i]], main=names[i])
}
dev.off()
If you want to inspect the meta information of each mass spectrum you can use:
metaData(rawSpectra[[1]])#View meta data of the first mass spectrum. Replace the 1 by any number you would like to view.
Methods
In total, 278 specimens of Cnidaria belonging to 23 different species from four classes were analyzed. Field specimens were morphologically identified to species level by taxonomic experts immediately after collection, before complete specimens or subsamples were preserved in undenatured ethanol (80 - 96%).
From each specimen, a small tissue fragment (max. 1 mm³) was incubated for 5 minutes with 5 µl of alpha-cyano-4-hydroxycinnamic acid (HCCA) matrix. Of this incubated solution, 1 to 1.5 µl were transferred to a target plate on one to nine spots for co-crystallization of matrix and analytes. Each spot was measured one to three times using a Microflex LT/SH System (Bruker Daltonics). Employing the flexControl 3.4. (Bruker Daltonics) software, molecule masses were measured from 2 to 20k Dalton (kDA). A centroid peak detection algorithm was carried out for peak evaluation by analyzing the mass peak range from 2 to 20 kDa. Furthermore, peak evaluation was carried out by a signal-to-noise threshold of two and a minimum intensity threshold of 600 with a peak resolution higher than 400. To validate fuzzy control, the proteins/oligonucleotide method was employed by maximal resolution of ten times above the threshold. To create a sum spectrum, a total of at least 120 laser shots were applied to a spot. Measurements were carried out using the same instrument at different occasions between 2013 and 2019.
MALDI-TOF data processing
MALDI-TOF raw data were imported to R, Version 4.1.0 (R-Core-Team, 2022) and processed using R packages MALDIquantForeign, Version 0.12 (Gibb, 2015) and MALDIquant, Version 1.20 (Gibb and Strimmer, 2012). Spectra were square-root transformed, smoothed using the Savitzky Golay method (Savitzky and Golay, 1964), baseline corrected using the SNIP method (Ryan et al., 1988) and spectra normalized using the TIC method. Repeated measurements were averaged by using mean intensities. Peak picking was carried out using a signal to noise ratio (SNR) of 12 and a half window size of 13. Mass peaks smaller than a SNR of 12 were however retained, if they occurred in other mass spectra as long as these were larger than a SNR value of 1.75, which is assumed as a lower detection limit. Repeated peak binning was carried out to align homologous mass peaks. Resulting data was Hellinger transformed (Legendre and Gallagher, 2001) and used for further analyses.
Usage notes
Data can be analyzed using Bruker proprietary software such as Bruker Flex analysis or Bruker Biotyper. Alternatively, data can be analyzed in R using R-packages MaldiQuant and MaldiQuantForeign following the vignette that can be found via: https://cran.r-project.org/web/packages/MALDIquant/index.html