Mass spectrometry-based proteomics of the Aurantiochytrium limacinum ATCC MYA-1381 zoospore-to-vegetative cell transition (MaxQuant processed data)
Data files
Apr 16, 2025 version files 86.48 MB
-
JGI_maxquant.xlsx
40.95 MB
-
MMETSP_maxquant.xlsx
45.52 MB
-
README.md
8.24 KB
Abstract
This dataset provides standard MaxQuant proteomic output files generated from an analysis of the Aurantiochytrium limacinum ATCC MYA-1381 zoospore-to-vegetative cell transition. Mass spectrometry data from three biological time-course replicates were analyzed using two separate protein prediction sets: one from a JGI genome annotation and the other from the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) annotation of the A. limacinum proteome. Each biological replicate in the dataset includes protein abundance measurements across five time points, capturing dynamic proteomic changes associated with cellular remodeling, metabolism, and ectoplasmic network formation. Data is provided as outputted from MaxQuant, including quantitative values for protein group identification and peptide-level measurements.
Dataset Overview
This dataset contains proteomic data from an analysis of the Aurantiochytrium limacinum ATCC MYA-1381 zoospore-to-vegetative cell transition. Mass spectrometry data from three biological replicates were processed using MaxQuant and analyzed separately against two different protein prediction sets: JGI genome annotation (MycoCosm), and Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) annotation.
Each dataset includes protein abundance measurements across five time points (T0, T2, T4, T6, T8), capturing proteomic changes associated with cellular remodeling, metabolism, and ectoplasmic network formation.
Contents of the Dataset
The dataset consists of two main spreadsheet files corresponding to the two protein prediction sets used in MaxQuant analysis:
- JGI_maxquant.xlsx
- MMETSP_maxquant.xlsx
Each file contains multiple tabs corresponding to standard MaxQuant output:
- proteinGroups_G1, G2, G3: Protein-level identification and quantification tables for each biological replicate.
- peptides_G1, G2, G3: Peptide-level identification and quantification tables for each replicate.
- Reporter_intensity_raw: Unnormalized reporter ion intensities across timepoints and replicates, aggregated per protein group.
- Reporter_intensity_corrected: Normalized reporter ion intensities, rescaled per channel to account for variation in labeling or loading.
ProteinGroups Tabs (proteinGroups_G1
,proteinGroups_G2
,proteinGroups_G3
)
Each row represents a protein group detected in the sample. Selected columns include:
- Protein IDs: Accession identifiers for the proteins in this group.
- Majority protein IDs: The most representative protein for the group (used for annotation).
- Peptide counts: Various counts of peptides detected (all / razor+unique / unique).
- Fasta headers: Original FASTA header from the input proteome.
- Sequence coverage: Percentage of protein sequence covered by identified peptides.
- Mol. weight [kDa]: Molecular weight of the majority protein.
- Q-value: False discovery rate-adjusted confidence score.
- Reporter intensity 113–117: Raw reporter ion intensities for each iTRAQ label.\
(Note: TheReporter intensity corrected
columns are present but identical and unmodified.) - Intensity: Summed intensity across channels.
- MS/MS count: Number of fragment spectra supporting the identification.
- Reverse / Potential contaminant: Flags for decoy hits or common contaminants (
+
if true). - Oxidation (M) site positions: Position(s) of methionine oxidation modifications if detected.
Peptides Tabs (peptides_G1
, peptides_G2
,peptides_G3
)
Each row represents a unique peptide identified in the sample.
- Sequence: Amino acid sequence of the peptide.
- Amino acid Count: Count of each residue type .
- Missed cleavages: Missed tryptic cleavage sites (integer).
- Mass: Monoisotopic mass of the peptide.
- PEP: Posterior error probability of the identification.
- Reporter intensity 113–117: Raw reporter ion intensities for each iTRAQ label.\
(TheReporter intensity corrected
columns are present but identical and unmodified.) - Mod. peptide IDs: ID for the modified form of the peptide.
- Reverse / Potential contaminant: Identification flags.
Reporter_intensity_raw and Reporter_intensity_corrected Tabs
These two tabs summarize reporter ion intensities for timepoints and replicates, aggregated per protein group.
- Reporter_intensity_raw: Contains the unnormalized reporter ion intensities (
Reporter intensity
values from MaxQuant), organized by protein group and experimental condition. - Reporter_intensity_corrected: Contains normalized reporter ion intensities, in which each iTRAQ channel was rescaled by a single multiplicative factor to correct for variation in sample loading or labeling efficiency. One channel served as the normalization reference and was not scaled.
In both tabs, columns follow the format:
Reporter intensity [Channel]_[Replicate]_T[Timepoint]\
For example:
Reporter intensity 113_G1_T0-1
= iTRAQ channel 113, biological replicate [G]1, timepoint T0, biological replicate 1
Each row represents a protein group, with intensities reflecting the summed reporter signal across all peptides assigned to that group.
Missing or Empty Values
Some columns in the dataset contain empty cells. These are retained in their original MaxQuant format and are not errors or omissions. Instead, they reflect values that are either not applicable to a particular entry or were not identified in the experiment. Below is a summary of common cases:
Only identified by site
(proteinGroups): Empty unless a protein group is identified solely by a modified peptide (and not a peptide sequence). Most entries are fully identified and leave this field blank.Reverse
/Potential contaminant
(proteinGroups and peptides): These fields are part of MaxQuant’s internal false discovery rate and contaminant checking system. Most valid protein or peptide entries are not reverse hits or contaminants, and thus these cells are empty.Oxidation (M) site IDs
/Oxidation (M) site positions
: These fields are populated only if oxidation of methionine residues was observed for a particular peptide or protein group. If no oxidation is detected, the field is left blank.Taxonomy IDs
: This column is part of MaxQuant’s support for multi-species analyses. As this dataset only includes Aurantiochytrium limacinum, taxonomy IDs are not populated and the field remains empty.
We chose to retain the original MaxQuant output format to ensure compatibility with standard proteomics pipelines. If a cell is empty, it indicates either a missing value (i.e., not observed or applicable) or a feature not detected in the sample.
Units and Values
- Intensity values: Arbitrary units derived from MS signal; useful for relative quantification.
- Q-values / PEP: Range from 0 to 1; lower values reflect higher confidence.
- Sequence coverage [%]: Range from 0 to 100.
- Mass / Molecular weight: Measured in Daltons (Da) or kiloDaltons (kDa).
Additional Notes
- All quantification values are channel-based and reflect the iTRAQ 8-plex design:
- 113 = T0
- 114 = T2
- 115 = T4
- 116 = T6
- 117 = T8
Each replicate has its own set of these channels.
Abbreviations and Codes
- T0, T2, T4, T6, T8 – Time points representing 0, 2, 4, 6, and 8 hours post-settlement.
- JGI – Joint Genome Institute annotation.
- MMETSP – Marine Microbial Eukaryote Transcriptome Sequencing Project annotation.
References
- For full descriptions of columns and analysis parameters, please refer to:
- MaxQuant documentation: https://www.maxquant.org/
- Cox & Mann, 2008. Nature Biotechnology, DOI: 10.1038/nbt.1511
- Tyanova et al., 2016. Nature Protocols, DOI: 10.1038/nprot.2016.136
Software Requirements
To open and analyze the dataset, the following software is recommended:
- DEP v1.20.0 (Bioconductor package) for processing MaxQuant output, normalization, and statistical analysis
- Perseus , optionally, for further statistical analysis of MaxQuant output
- Available at: https://maxquant.net/perseus
- Spreadsheet software (e.g., Microsoft Excel, Google Sheets, or LibreOffice) for viewing
.xlsx
files
Contact Information
For any questions regarding this dataset, please contact: Jackie Collier; jackie.collier@stonybrook.edu or Joshua Rest; joshua.rest@stonybrook.edu
Aurantiochytrium limacinum ATCC MYA-1381 was cultured at 25°C in GPY medium containing 3% D-glucose, 1.5% peptone, 0.5% yeast extract, and 1.8% Instant Ocean sea salt. After 24 hours, cells were transferred to GPY agar plates and grown for an additional 24 hours. Zoospores were released by flooding plates with artificial seawater (1.8% Instant Ocean) and collected after 2 hours. Zoospores were inoculated into petri dishes containing A1 medium, and samples were collected at five time points: zoospores at 0 hours (T0) and settled cells at 2, 4, 6, and 8 hours post-settlement (T2, T4, T6, T8). Cells were harvested by centrifugation, and pellets were stored at -80°C for protein extraction.
Protein extraction was performed using a lysis buffer containing KCl, MgCl₂, Tris, NP-40, Tween, SDS, and deionized water. Samples were vortexed, centrifuged, and supernatants were collected. Protein concentration was determined using a bicinchoninic acid (BCA) assay. Proteins were digested with trypsin and labeled using iTRAQ-8plex reagents (113–117). Peptides were fractionated by high-performance liquid chromatography (HPLC) and analyzed using an Ultimate 3000 nano UHPLC system coupled to a Q Exactive mass spectrometer (Thermo Fisher Scientific).
Mass spectrometry data were analyzed separately against two predicted A. limacinum proteomes: one from the JGI genome annotation (MycoCosm) and the other from the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) annotation. Searches were conducted using MaxQuant v1.6.2.14 with trypsin enzyme specificity allowing up to two missed cleavages. Carbamidomethylation of cysteine was set as a fixed modification, while oxidation of methionine and iTRAQ-8plex labeling were included as variable modifications. The precursor mass tolerance was set to 10 ppm, and the MS/MS tolerance was 0.6 Da. MaxQuant output included standard protein identification and quantification metrics, with results organized into per-replicate proteinGroups and peptides, as well as summary tables of total reporter intensities and normalized (corrected) values.