Data from: Synthetic torpor in the rat protects the heart from ischaemia-reperfusion injury
Data files
Mar 17, 2026 version files 34.45 MB
-
Guide_CSV.csv
8.33 KB
-
PTM_mass_spec_data_CSV.csv
18.21 MB
-
PTM_mass_spec_data.xlsx
16.17 MB
-
README.md
14.39 KB
-
STEP_1_AND_2_-_preprocessing_and_QC.R
13.82 KB
-
STEP_3_-_limma.R
6.85 KB
-
STEP_4_GSEA_and_figures.R
10.93 KB
-
STEP_5_rat_to_human_mapping.R
11.97 KB
-
STEP_6_run_PTM-SEA.R
3.77 KB
-
STEP_7_plot_PTM_SEA_outputs.R
4.30 KB
Abstract
During times of environmental stress, many animals enter torpor: a reversible protective physiological state typically characterised by reductions in core temperature, heart rate, and oxygen consumption. Species that naturally enter this hypothermic and hypometabolic state are tolerant of ischaemia-reperfusion injury. Consequently, there is a growing interest in utilizing aspects of torpor for clinical applications, such as protection from stroke or myocardial infarction. It is currently unknown, however, whether a torpor-like state is protective in animals that do not naturally enter torpor. Using viral vector-mediated chemogenetic activation of the medial preoptic area of the hypothalamus, we induced synthetic torpor in the rat, a species that does not naturally enter torpor. We demonstrate this state is cardioprotective in an ex vivo ischaemia-reperfusion injury model with an ~40% reduction in infarct size. Synthetic torpor-induced cardioprotection of the normothermic, isolated heart is not dependent on prior hypothermia in vivo. Phosphoproteomic analysis of cardiac tissue indicates the protective effects of synthetic torpor may be mediated by parallel activation of cell survival and stress tolerance pathways and inhibition of cell death pathways. These findings provide important insights into the mechanisms of organ protective effects of synthetic torpor states with implications for future clinical translation in humans.
This dataset contains output from tandem mass tag phosphoproteomic mass spectroscopy analysis from 12 rats, 5 of which entered synthetic torpor 90 minutes prior to sacrifice (7 controls).
https://doi.org/10.5061/dryad.v6wwpzh9r
Data files
PTM_mass_spec_data.xlsx
The Guide Tab of the Excel report explains the columns in the Data tab. In each column of the Data tab, it should be clear whether the data shown is from the Total proteome or the Phospho proteome analysis. The report can be roughly split into 3 parts:
- Protein/peptide information – information about the identified protein/phosphopeptide.
- Stats – The fold change comparisons between your conditions, split firstly by comparison, then by Total or Phospho.
- Quan – The information used to generate your fold change data (as described in the Guide).
This file is also provided in .csv, for accessibility (PTM_mass_spec_data_CSV.csv and Guide_CSV.csv)
Analysis code (R files)
STEP_1_AND_2_-_preprocessing_and_QC.R
Description
This R script implements Steps 1 and 2 of a mass spectrometry-based phosphoproteomics data processing pipeline. It performs preprocessing, phosphosite parsing, data restructuring, and quality control (QC) on peptide-level phosphorylation abundance data, preparing the dataset for downstream statistical analysis.
Input
A single Microsoft Excel file (Dataset for reconstruction of protein.xlsx, "Data" sheet) containing peptide-level phosphoproteomics data exported from a proteomics data processing platform (e.g. Proteome Discoverer). The file is expected to include phosphopeptide filter flags, protein accession and gene name annotations, modification site strings (with localisation probabilities), and log2-adjusted, normalised phosphorylation abundance values per sample.
Processing steps
- Data ingestion and filtering: Reads the raw dataset, retains only rows flagged as phosphopeptide-present, and filters for entries containing parseable phosphosite modification strings.
- Phosphosite parsing: Extracts individual phosphorylated residues (S, T, Y), their sequence positions, and localisation probabilities from the modification annotation strings. Retains only sites with a localisation probability ≥ 75%.
- Site-level collapse: Where multiple peptides map to the same phosphosite, values are collapsed to a single entry per unique site (identified as proteinID_residuePosition) by taking the mean intensity across peptides.
- Sample name standardisation: Renames verbose instrument-derived column headers to concise identifiers encoding animal ID and experimental condition (Control or Synthetic Torpor), and saves a mapping table.
- Output object generation: Produces a site × sample expression matrix, a long-format data table, and a site annotation table for use in downstream pipeline steps.
Quality control
Site-level QC metrics are computed for each sample and saved as tabular outputs and plots:
- Missingness: Fraction of missing values per sample.
- Inter-sample correlation: Pairwise Spearman correlations across all samples and within batch (freeze-thaw vs. standard processing).
- Principal component analysis (PCA): Performed on complete-case sites; PC1/PC2 scores saved and plotted, coloured by condition and batch.
- Intensity distribution: Median and interquartile range (IQR) per sample.
- Composite outlier score: A weighted z-score combining all QC metrics to rank samples by their deviation from the cohort, facilitating objective outlier identification.
- Heatmap: Hierarchically clustered heatmap of the top 600 most variable phosphosites (row-scaled), with condition and batch annotations.
STEP_3_-_limma.R
Purpose
This script performs differential phosphorylation analysis at the individual phosphosite level, comparing samples in a Synthetic Torpor condition against Controls. It uses the limma (Linear Models for Microarray Data) framework, which is widely applied to proteomics and phosphoproteomics data. Batch effects arising from sample preparation differences (freeze-thaw vs. standard processing) are statistically accounted for within the model rather than by correcting the data directly.
Inputs (from the sense check/ folder produced by Steps 1 & 2)
- expr_mat_sites_x_samples.rds — the site × sample intensity matrix
- site_annot.rds — phosphosite annotations (protein, gene, residue, position)
- QC_sitelevel_sample_metadata.csv — sample condition and batch assignments
Key steps
- Model design: A linear model is constructed with condition (Control vs. Synthetic Torpor) and batch as covariates, so that differential abundance estimates are batch-adjusted.
- limma fit: A linear model is fitted to each phosphosite using lmFit, followed by empirical Bayes moderation of variance estimates (eBayes), which improves statistical power with smaller sample sizes.
- Results extraction: Site-level statistics (log2 fold change, p-value, Benjamini-Hochberg FDR-adjusted p-value) are extracted for the Synthetic Torpor vs. Control contrast and merged with site annotations.
- Volcano plot: Sites are plotted by log2 fold change vs. −log10(p-value), colour-coded by significance tier (FDR < 0.1, p < 0.05, or non-significant).
- DE heatmap: The top 300 sites by absolute log2 fold change are row-scaled and plotted as a hierarchically clustered heatmap, annotated by condition and batch.
- Batch-adjusted PCA: For visualisation purposes only, removeBatchEffect is used to produce a batch-corrected version of the matrix, on which PCA is run to show sample separation by condition.
STEP_4_GSEA_and_figures.R
Purpose
This script extends the differential phosphorylation analysis from Step 3 by performing gene-level Gene Set Enrichment Analysis (GSEA) against the Reactome pathway database. Phosphosite-level statistics from the limma model are collapsed to a single score per gene, which is then used to rank genes and test for coordinated enrichment of Reactome pathways in the Synthetic Torpor vs. Control comparison. The script also regenerates the volcano plot, DE heatmap, and batch-adjusted PCA from Step 3, making it a self-contained end-to-end figure-generation script.
Inputs (from the sense check/ folder)
- expr_mat_sites_x_samples.rds — site × sample intensity matrix
- site_annot.rds — phosphosite annotations
- QC_sitelevel_sample_metadata.csv — sample condition and batch assignments
Key steps
- limma differential analysis (repeated from Step 3): Fits a batch-covariate linear model and extracts moderated t-statistics and fold changes for the Synthetic Torpor vs. Control contrast.
- Gene-level rank construction: Collapses site-level results to one representative score per gene by selecting the phosphosite with the largest absolute moderated t-statistic. This signed t-statistic becomes the gene-ranking metric for GSEA.
- Gene symbol → Entrez ID mapping: Uses biomaRt to query Ensembl and convert rat gene symbols to Entrez IDs, which are required by the ReactomePA package.
- Reactome GSEA: Runs gsePathway (from ReactomePA) on the Entrez-ranked gene list against the rat Reactome pathway database, testing pathways containing 10–500 genes and returning results at all p-value thresholds for inspection.
- GSEA dotplot: Visualises the top 30 enriched/depleted pathways ranked by absolute Normalised Enrichment Score (NES), with dot size encoding gene set size and colour encoding −log10(raw p-value).
STEP_5_rat_to_human_mapping.R
Purpose
This script bridges the rat phosphoproteomics data to the human proteome, enabling downstream analysis tools (specifically PTM-SEA) that rely on human reference databases. It has two sequential components: first it re-runs and cleans the limma differential analysis to produce a standardised output, then it performs cross-species phosphosite mapping by aligning rat protein sequences to their human orthologues and transferring phosphosite positions.
Inputs
From the sense check/tables/ folder:
- site_only object (in-memory from earlier steps) — site-collapsed phosphorylation intensities
From the data/ folder (user-supplied reference files):
- rat_human_orthologues.tsv — rat-to-human gene orthologue table (e.g. from Ensembl BioMart)
- human_ensembl_uniprot.txt — mapping of human Ensembl protein IDs (ENSP) to UniProt accessions
- Rattus_norvegicus.GRCr8.pep.all.fa — rat proteome FASTA (Ensembl)
- Homo_sapiens.GRCh38.pep.all.fa — human proteome FASTA (Ensembl)
Key steps
Part A — Clean limma output for PTM-SEA
- Rebuilds the site × sample expression matrix from site_only, applying a coverage filter (minimum 2 observed values per group per site) to ensure model stability.
- Re-fits the limma batch-covariate model and extracts annotated DE results, saving a clean standardised output (res_annot_clean) for use by downstream PTM-SEA steps.
Part B — Rat-to-human phosphosite mapping
- Gene symbol → Ensembl ID: Converts rat gene symbols in the DE results to Ensembl gene IDs using the org.Rn.eg.db annotation package.
- Rat gene → human protein: Joins the orthologue table to map rat Ensembl gene IDs to human Ensembl protein IDs (ENSP), then maps these to human UniProt accessions.
- FASTA loading and isoform selection: Loads rat and human proteome FASTAs using Biostrings; for each gene/protein, selects the longest isoform as the canonical sequence.
- Pairwise sequence alignment: For each rat gene–human protein pair, performs global pairwise alignment of the protein sequences using the BLOSUM62 substitution matrix (pwalign). Iterates through the alignment to map each rat phosphosite position to its equivalent human position.
- Residue identity check: Only accepts a mapped position if the human sequence carries the same amino acid residue (S, T, or Y) as the rat site, ensuring biological validity.
- Human site ID construction: Formats confirmed mapped sites as UniProtID;ResiduePosition-p (e.g. P12345;S473-p), the standard notation required by PTM-SEA signature databases.
STEP_6_run_PTM-SEA.R
Purpose
This script implements PTM-SEA (Post-Translational Modification Signature Enrichment Analysis) using a fast GSEA algorithm (fgsea). It tests whether curated sets of known regulatory phosphosites from the PTMsigDB database are coordinately enriched among the differentially phosphorylated sites identified in the rat-to-human mapped results from Step 5. By working with human-mapped site identifiers, it leverages the full depth of human PTM knowledge bases to infer kinase activities, signalling pathway states, and perturbation signatures relevant to the Synthetic Torpor condition.
Inputs
From sense check/tables/:
- rat_human_mapped_aligned_clean.rds — DE results with human-mapped phosphosite identifiers (from Step 5)
From the data/ folder (user-supplied):
- ptm.sig.db.all.uniprot.human.v2.0.0.gmt — PTMsigDB v2.0.0 GMT file containing curated human phosphosite signatures (kinase substrates, signalling pathways, perturbation sets), in UniProt site format
Key steps
- Ranking vector construction: Filters to sites with both a valid human site identifier and a limma moderated t-statistic. Where multiple rat sites map to the same human site, the one with the largest absolute t-statistic is retained. Sites are ranked in descending order of their signed t-statistic, so that strongly upregulated sites appear at the top and strongly downregulated sites at the bottom.
- PTMsigDB parsing: Reads the GMT file and parses each signature, preserving directionality tags encoded in PTMsigDB (_u for up/activated, _d for down/inhibited). Trailing directional suffixes are stripped from site identifiers so they match the UniProtID;ResiduePosition-p format generated in Step 5.
- Overlap filtering: Intersects each PTM signature with the ranked site list. Signatures with fewer than 5 overlapping sites are excluded, retaining only those with sufficient coverage for reliable enrichment testing.
- fgsea enrichment: Runs fgsea with 10,000 permutations on all retained signatures, producing normalised enrichment scores (NES), p-values, and BH-adjusted p-values for each PTM signature.
- Results export: Results are sorted by adjusted p-value and saved for downstream interpretation and visualisation.
STEP_7_plot_PTM_SEA_outputs.R
Purpose
This final script visualises the PTM-SEA results from Step 6, focusing specifically on kinase activity signatures. It produces a publication-ready combined figure pairing a volcano plot of all kinase signatures with a ranked dot plot of the top 15, using shared colour and size scales throughout for direct visual comparability.
Inputs
From sense check/tables/:
- PTMSEA_directional_fgsea_results.rds — full PTM-SEA fgsea results table (from Step 6)
Key steps
- Kinase signature filtering: Retains only signatures prefixed with KINASE from the PTM-SEA results, extracting kinase labels and directional tags (_u for upregulated/activated substrates, _d for downregulated/inhibited substrates).
- Signature collapsing: Where multiple database sources (e.g. PhosphoSitePlus and iKiP) provide signatures for the same kinase, retains the single best-scoring entry per kinase ranked by absolute NES, then p-value, then signature size.
- Top 15 selection: Identifies the 15 kinases with the largest absolute NES for labelling and dot plot display.
- Shared scale computation: Calculates global axis and colour limits across all kinase signatures to ensure the two plot panels use identical scales.
- Volcano plot: Plots all kinases as NES vs. −log10(raw p-value), with point size encoding signature size, colour encoding significance, and shape encoding directionality. The top 15 kinases by |NES| are labelled using non-overlapping text repulsion (ggrepel).
- Dot plot: Displays the top 15 kinases ranked vertically by NES, using the same colour, size, and shape encodings as the volcano plot.
- Combined figure: The two panels are assembled side-by-side using patchwork with a shared legend, and saved as both PNG and SVG.
Experimental Design
The sample is protein from lysed rat heart tissues. Each sample is from a different heart, collected in two batches (R and S) with two biological conditions: Control and Synthetic Torpor.
Analysis method
For this, we performed two separate analyses: one to compare the total protein levels across the samples and a second in which we included a phospho-peptide enrichment step to allow us to compare phospho-peptide levels across the same samples. Data analysis was then performed using Proteome Discoverer v2.4 software to run a Sequest search against the Rat protein databases and against a 'Common Contaminants' database which we include routinely in all our searches. The Phospho analysis was set up as for the Total, but with phosphorylation at S, T, and Y included as possible peptide modifications. For the Total proteome analysis, normalisation on the 'Total Peptide Amount' in each sample was included in the data analysis set up (since we started with an equal amount of each), but this was not included for the Phospho analysis, since after phospho-enrichment, we would not necessarily expect the samples to contain the same amount of peptides. Instead, the normalisation factor calculated for the total proteome was applied to the phosphopeptide data. All the data was filtered using a 5%FDR cut-off.
Statistical analysis
The MS data were searched against the Rat databases retrieved 2025-07-04, and updated with additional annotation information on 2025-09-29. Protein groupings were determined by PD2.4; however, master protein selection improvement, and all following statistical analyses were performed in the R statistical environment version 4.4.0. The Master protein improvement script first searches Uniprot for the current annotation status of all protein accessions and updates redirected or obsolete accessions. The script further takes the candidate master proteins for each group and uses current Uniprot review and annotation status to select the best annotated protein as master protein without loss of identification or quantification quality.
The abundances for each sample were normalised to the total peptide amount in the protein samples, with total protein normalisation factors also applied to the phospho dataset. These values were then Log2 transformed to bring them closer to a normal distribution. Further, the identified list of phosphopeptides was matched with the corresponding proteins in the total dataset, for ease of comparison. The phosphopeptide abundances were then adjusted to changes in total protein abundances, facilitating the identification of changes in the relative phosphorylation of a protein as well as changes in the overall abundance of a protein whose phosphorylation level remains unchanged. Data is then exported to the Excel document in the main folder. Phospho columns in the Excel output which integrate changes in total protein abundance in this way are identified by the word ‘adjusted’ in the column header.
The statistical analysis performed assumes no additional sources of variance beyond those specified. We use the limma package for our statistical test, as this was identified in a recent Nature Comms paper as one of the highest scoring methods for identification of differentially expressed proteins. If you’d like a more in-depth analysis, or to discuss alternative statistical approaches, you can arrange a meeting with Phil to discuss your experimental design and he’ll advise you whether there may be more appropriate statistical methods which may give you more powerful insights into your data. A single LIMMA model was constructed for all conditions and moderated t-tests performed for each comparison of interest, then the p-value was adjusted using the Benjamini-Hochberg FDR method.
