Skip to main content

A proteome-wide map of chaperone-assisted protein refolding in a cytosol-like milieu

Cite this dataset

To, Philip et al. (2022). A proteome-wide map of chaperone-assisted protein refolding in a cytosol-like milieu [Dataset]. Dryad.


The journey by which proteins navigate their energy landscapes to their native structures is complex, involving (and sometimes requiring) many cellular factors and processes operating in partnership with a given polypeptide chain’s intrinsic energy landscape.  The cytosolic environment and its complement of chaperones play critical roles in granting many proteins safe passage to their native states; however, it is challenging to interrogate the folding process for large numbers of proteins in a complex background with most biophysical techniques.  Hence, most chaperone-assisted protein refolding studies are conducted in defined buffers on single purified clients.  Here, we develop a limited-proteolysis mass spectrometry approach paired within an isotope-labeling strategy to globally monitor the structures of refolding E. coli proteins in the cytosolic medium and with the chaperones, GroEL/ES (Hsp60) and DnaK/DnaJ/GrpE (Hsp70/40).  GroEL can refold the majority (85%) of the E. coli proteins for which we have data, and is particularly important for restoring acidic proteins and proteins with high molecular weight, trends that come to light because our assay measures the structural outcome of the refolding process itself, rather than binding or aggregation.  For the most part, DnaK and GroEL refold a similar set of proteins, supporting the view that despite their vastly different structures, these two chaperones both unfold misfolded states, as one mechanism in common.  Finally, we identify a cohort of proteins that are intransigent to being refolded with either chaperone.  The data support a model in which chaperone-nonrefolders may fold most efficiently cotranslationally, and then remain kinetically trapped in their native conformations.


Please see attached manuscript for full detailed methods on data collection and processing. In brief, Proteome Discoverer (PD) Software Suite (v2.4, Thermo Fisher) and the Minora Algorithm were used to analyze mass spectra and perform Label Free Quantification (LFQ) of detected peptides. The data were searched against Escherichia coli (UP000000625, Uniprot) reference proteome database. For peptide identification, either the PD Sequest HT node (for non-pseudo-SILAC samples) or PD MSFragger node (pseudo-SILAC) were used, each using a semi-tryptic search allowing up to 2 missed cleavages. PD LFQ data was exported in a three tiered hierarchy (Protein > Peptide Group > Consensus Feature) and analyzed using our in-house script (available on GitHub). Analyzer compiles each sequenced peptide, along with its associated metadata, the identity of the peptide as tryptic or half-tryptic (and if so, the location of the proteinase K cleavage site), abundance ratio, normalized abundance ratio, and P-value, coefficient of variation and outputs it into a _out.txt file. 

Usage notes

Peptide out files are (.txt) files and can be opened and visualized using Microsoft Excel. 


Office of the Director, Award: DP2GM140926

National Science Foundation, Award: Division of Molecular and Cellular Biology MCB2045844

National Institute of General Medical Sciences, Award: R01GM079440

National Institute of General Medical Sciences, Award: Training Grant T32GM008403