The midnolin-proteasome pathway catches proteins for ubiquitination-independent degradation
Data files
Aug 25, 2023 version files 20.07 GB
Abstract
Cells use ubiquitin to mark proteins for proteasomal degradation. While the proteasome also eliminates proteins that are not modified by ubiquitin, how this occurs mechanistically is unclear. We show here that midnolin promotes the destruction of many nuclear proteins including transcription factors encoded by the immediate-early-genes. Diverse environmental cues induce midnolin and its overexpression is sufficient to cause the degradation of its targets by a mechanism, which, remarkably, does not require ubiquitination. Instead, midnolin associates with the proteasome via an alpha-helix, employs its Catch-domain to bind a region within substrates that adopts a beta-strand conformation, and uses a ubiquitin-like-domain to promote substrate destruction. Thus, midnolin contains three regions that function in concert to target a large set of nuclear proteins to the proteasome for degradation.
Methods
DataS1: EGR1 and FosB genome-wide CRISPR-Cas9 screens
Genome-wide CRISPR-Cas9 screens were performed to uncover regulators of EGR1 and FosB protein stability. Specifically, the plasmid library was packaged into lentivirus by transfecting HEK-293T cells using PolyJet as described earlier, and the lentivirus was titered to obtain a multiplicity of infection around 0.3. HEK-293T cells were generated to express the GPS 3.0 FosB or GPS 3.2 EGR1 reporters by selecting using hygromycin (200 µg/mL). These cells were then transduced with the titered CRISPR-Cas9 genome-wide Root library lentivirus at an MOI ~0.3 to maintain a 500x representation throughout. Cells were selected 48 hours post-transduction for 7 days using puromycin (2 µg/mL) to remove uninfected cells. On the ninth day of puromycin selection, the 95th percentile most stable cell population was collected based on the GFP/DsRed ratio by FACS using a MoFlo Astrios instrument (Beckman Coulter). Additionally, the unsorted input cells were collected based on the number of cells collected in the enriched population. Collected cells were rinsed once with PBS, pelleted, and stored at -80°C.
DataS2: Midnolin GPS ORFeome screen
Sufficient cell numbers were used to maintain at least a 300-fold coverage of the library throughout. The library was packaged into lentivirus, which were used to transduce MIDN KO HEK-293T at a multiplicity of infection of 0.2. Two days post-transduction, the HEK-293T cells were treated with 2 µg/mL of puromycin for 6 days to remove uninfected cells, passaging the library once in between the selection period. The library-expressing cells were plated at 4 million cells/plate in a 15 cm dish and were transfected two days later using Polyjet with EF1a-Midnolin co-expressing BFP, or BFP alone as a negative control. The cells were harvested two days post-transfection and were sorted into six stability bins based on the GFP/DsRed ratio by FACS using a MoFlo Astrios instrument (Beckman Coulter). The sorting gates were established using the BFP control to ensure 1/6th of the population was collected per bin. Once the control populations were collected, the cells overexpressing midnolin were partitioned using the exact same sorting and gating settings as the control. The collected cells from each stability bin were rinsed once with PBS, pelleted, and frozen at -80°C for at least 12 hours.
Deconvolution of the pooled screens for DataS1 and DataS2
Cell pellets were thawed, and genomic DNA was harvested using a Gentra Puregene Core Kit, Qiagen. The sgRNAs or barcodes were then amplified by PCR using all the genomic DNA as template (4 µg DNA per reaction) to include stagger sequences and Q5 Hot Start High-Fidelity DNA Polymerase from NEB. A second round of PCR was performed using the clean PCR1 product to add the Illumina P5 and P7 adaptor sequences. PCR2 samples were cleaned, pooled in the correct ratio, and sequenced on a NextSeq 500 instrument. The abundance of sgRNAs or barcodes was extracted from the raw sequencing data using Cutadapt and was mapped onto the reference library using Bowtie2.
For DataS1, MAGeCK was used to determine the enrichment of sgRNAs in the 95th percentile relative to the input population. The MAGeCK score plotted on the Y-axis represents the negative log10 of the “pos|score” value generated by MAGeCK.
For the GPS ORFeome analysis in DataS2, the abundance of each ORF was corrected to account for sequencing depth and a protein stability index (PSI) score between 1 (most unstable) and 6 (most stable) was calculated using the following formula for each extracted ORF: where i=the number of the stability bin denoted as an integer and Ri = the Illumina read proportion extracted from the bin i. The change in protein stability between midnolin and BFP is denoted as the difference in PSI (deltaPSI).
DataS3: Endogenous Midnolin Immunoprecipitation and Mass Spectrometry
HEK-293T cells expressing endogenous 3xHA-tagged midnolin were cultured to 90% confluency in 5 15 cm plates per condition and unedited wild-type HEK-293T cells were cultured in 5 15 cm plates. The knock-in cells were treated was DMSO or 10 µM MG132 for 6 hours while the unedited wild-type HEK-293T served as background and were treated with 10 µM MG132 for 6 hours. An anti-HA immunoprecipitation was performed using the same lysis conditions and protocol as described in the immunoprecipitation section of the methods. After the final wash, the beads were resuspended in 100 µL of 50 mM Tris pH 8.5 containing 5% SDS, and the samples were heated at 95°C for 5 minutes to elute the proteins.
Eluted proteins were then digested using trypsin on S-Trap Micro columns (Protifi, C02-micro-10) following the manufacturer’s protocol. Specifically, proteins were first reduced using 5 mM TCEP for 15 minutes at 55°C and were subsequently alkylated with 20 mM iodoacetamide for 30 minutes in the dark at room temperature. After alkylation, the samples were acidified using phosphoric acid to a final concentration of 2.5% (v/v), and 10 volumes of 100 mM Tris, pH 7.55 in 90% methanol/10% water were added to the samples to dilute the protein. This solution was then passed through S-Trap column by centrifuging for 30 seconds at 4,000xg. Multiple rounds of centrifugation were needed to load the entirety of one sample onto one column. Once the protein was trapped, the column was rinsed three times using 100 mM Tris, pH 7.55 in 90% methanol/10% water, followed by a dry spin, before adding 2 µg of trypsin suspended in 20 µL 50 mM ammonium bicarbonate, pH 8. Columns were kept overnight at 37°C in a humid environment. After digestion, the peptides on the column were eluted by centrifuging three times for 1 minute at 4,000xg using three buffers applied sequentially: first 40 µL ammonium bicarbonate pH 8, second 40 µL 0.2% formic acid in water, and third 40 µL 50% acetonitrile in water. The pooled peptides were dried under reduced pressure using a SpeedVac and were resuspended in 30 µL 0.1% formic acid in water. LC-MS/MS data were acquired by injecting 10 µL of resuspended peptide sample.
A protein database consisting of the Human UniProt SwissProt proteome (downloaded on November 13th, 2022) was used to identify proteins that co-immunoprecipitated with endogenous 3xHA-midnolin. Specifically, the FragPipe graphical user interface (v18.0) was used to search the data using the MSFragger search engine and to perform post-processing of the search results. The following parameters were used in the search. Tryptic peptides with a maximum of two missed cleavages were considered. Additionally, carbamidomethylation of cysteine was set as a fixed modification, and oxidation of methionine was allowed as a variable modification, with a maximum of four variable modifications per peptide. The allowed mass tolerances were 10 ppm for precursor ions and 0.04 Da for product ions. Peptide hits were filtered to a false discovery rate of 1% using PeptideProphet as implemented in FragPipe.
DataS4: AlphaFold PDB Models and summary of predicted beta strand degrons
AlphaFold multimer predictions
To identify beta strands within hits identified in the ORFeome GPS screen, genes with deltaPSI < -0.5 were taken (n=508) and the longest sequence across corresponding protein accession IDs (either NCBI Reference Sequence or Ensembl ID) was used as the input sequence for downstream steps (as barcodes from the screen were grouped at the gene level but could represent multiple isoforms). These sequences were individually paired with the MIDN sequence (UniProtKB: Q504T8) as a two-sequence FASTA file input into AlphaFold (v2.2.0) for multimer prediction with default reference databases and max_template_date=2022-01-01. Any selenocysteines were recoded as cysteines and three substrates (ACSBG2, ACSS2, and RIMBP3) that failed MSA using the default settings were rerun successfully by replacing the UniClust30_2018_08 database with UniRef30_2022_02.
Identification of substrate beta strands within Midnolin beta sheet
The 25 ranked PDB models from each AlphaFold run with MIDN and one of the substrates were then processed by a custom Python script to identify PDB models that folded a linear stretch of the substrate into beta strand conformation placed between beta strands of the corresponding MIDN domain. In more detail, a pairwise distance matrix was first computed between each ?-carbon atom in MIDN and each ?-carbon atom in the substrate as:
Di,j = square root((xi - xj)2 + (yi - yj)2 + (zi - zj)2)
where xi,yi, and zi are the coordinates of the ith substrate ?-carbon atom and xj, yj, and zj are the coordinates of the jth MIDN ?-carbon atom. As most beta sheets have inter-strand distances < 5 Å, the distance matrix was scanned to identify sequential substrate residues < 5.5 Å from corresponding linear stretches within each adjacent MIDN beta strand (ie. Di,j < 5.5 for both some sequential set of i with some sequential set of j, where 148 ≤ j ≤157, as well as the same set of i with another sequential set of j, where 279 ≤ j ≤ 286).
Secondary structure assignment for the PDB model was done with the DSSP algorithm. Substrate residues satisfying the distance requirements specified above were then retained if they were assigned the extended beta strand secondary structure (ie. “E” coding). As DSSP relies on flanking residues to call secondary structure, the most N- and C-terminal residues are not assigned secondary structure. To avoid excluding them from beta strand assignments, they were assigned “E” coding if the adjacent residue had been assigned “E” coding. To catch residues that are part of a beta strand, but slightly further from one or both of the MIDN beta strands, this set of residues was then expanded by 7 residues in each direction and again only those with extended beta strand secondary structure were kept. Finally, the longest contiguous stretch of beta strand secondary structure was kept (if any) for final reporting in DataS4.
Usage notes
PDB files for AlphaFold multimer predictions can be opened with softwares such as PyMOL and ChimeraX.