Data from: O-GlcNAc modification differentially regulates microtubule binding and pathological conformations of tau isoforms in vitro
Data files
Mar 20, 2025 version files 3.06 GB
-
3R_MetaMorph_Output.xlsx
12.43 MB
-
4R_MetaMorph_Output.xlsx
8.93 MB
-
Glc_Modified_3R_Tau.raw
740.37 MB
-
Glc_Modified_4R_Tau.raw
747.86 MB
-
README.md
59.74 KB
-
Unmodified_3R_Tau.raw
746.28 MB
-
Unmodified_4R_Tau.raw
801.32 MB
Abstract
Tau proteins undergo several post-translational modifications (PTMs) in physiological and disease conditions. In Alzheimer’s disease, O-linked β-d-N-acetylglucosamine (O-GlcNAcylation) modification of serine/threonine (S/T) residues in tau is reduced. In mouse models of tauopathy, O-GlcNAcase inhibitors lead to increased O-GlcNAcylation and decreased filamentous aggregates of tau. However, various non-filamentous tau conformations, linked to toxicity and neurodegeneration in tauopathies, involve processes like oligomerization, misfolding, and greater exposure of the phosphatase-activating domain in the amino-terminus of tau. Additionally, it is becoming clearer that PTMs may differently regulate tau pathobiology in an isoform-dependent manner. Therefore, it is crucial to investigate the effects of O-GlcNAcylation on non-filamentous conformations of both the 4-repeat (4R, e.g. hT40) and 3-repeat (3R, e.g. hT39) tau isoforms. In this study, we assessed how O-GlcNAcylation impacts pathological tau conformations of the longest 4R and 3R tau isoforms (hT40 and hT39, respectively) using recombinant proteins. Mass spectrometry showed that tau is modified with O-GlcNAc at multiple S/T residues, primarily in the proline-rich domain and the C-terminal region. O-GlcNAcylation of hT40 and hT39 does not affect microtubule polymerization but has opposite effects on hT40 (increases) and hT39 (decreases) binding to pre-formed microtubules. Although O-GlcNAcylation interferes with forming filamentous hT40 aggregates, it does not alter the formation of pathological non-filamentous tau conformations. On the other hand, O-GlcNAcylation increases the formation of pathological non-filamentous hT39 conformations. These findings suggest that O-GlcNAcylation differentially modulates microtubule binding and the adoption of pathological tau conformations in the longest 4R and 3R tau isoforms.
https://doi.org/10.5061/dryad.m0cfxppdc
Description of the data and file structure
These data are mass spectrometry RAW data files and full data sets from MetaMorpheus analysis output derived from analyzing recombinant human tau proteins in E. coli that were transfected either with or without O-GlcNAc transferase enzyme to produce O-GlcNAc-modified or unmodified proteins, respectively. The recombinant proteins were purified using a series of chromatography approaches (heavy metal-affinity, size exclusion and anion exchange). The samples were digested with Asp-N and rLysC and analyzed by mass spectrometry for O-GlcNAc modifications.
Files and variables
File: Unmodified_3R_Tau.raw
Description: RAW data files for unmodified recombinant 3R tau protein.
File: Glc_Modified_3R_Tau.raw
Description: RAW data files for O-GlcNAc modified recombinant 3R tau protein.
File: Unmodified_4R_Tau.raw
Description: RAW data files for unmodified recombinant 4R tau protein.
File: Glc_Modified_4R_Tau.raw
Description: RAW data files for O-GlcNAc modified recombinant 4R tau protein.
File: 3R_MetaMorph_Output.xlsx
Description: Full data set of protein groups, PSMs, quantified peptides, quantified peaks and sample codes from MetaMorpheus analysis output on 3R tau proteins (see below for table of variables, units and description within the file).
File: 4R_MetaMorph_Output.xlsx
Description: Full data set of protein groups, PSMs, quantified peptides, quantified peaks and sample codes from MetaMorpheus analysis output on 4R tau proteins (see below for table of variables, units and description within the file).
Table of variables, units and description for content within 3RMetaMorph_Output.xlsx and** 4RMetaMorph_Output.xlsx files.**
Sample Code Tabs:
Column Variable | Units | Description |
---|---|---|
Sample_code | name | Identification code for each sample analyzed |
Recombinant tau protein | name | Name of tau protein that matches the sample codes |
Protein Groups Tabs:
Column Variable | Units | Description |
---|---|---|
Protein Accession | alphanumeric | The accession number of the protein as specified in the protein database. |
Gene | name | The gene name associated with the identified peptide’s parent protein. |
Organism | name | Name of the organism for the identified protein |
Protein Full Name | name | The full name of the peptide’s parent protein. |
Protein Unmodified Mass | Daltons | Molecular weight of the identified full protein without modifications |
Number of Proteins in Group | count | The number of proteins in the protein group. Multiple proteins are associated with a peptide identification when parsimony cannot distinguish between the options. |
Unique Peptides | amino acid letters | Peptides that are unique to the listed protein (they can only come from that one protein, based on the database in silico digestion). Currently, peptides that are unique to the group are not listed here; i.e., a protein group with >1 protein will always have 0 unique peptides because they are shared between all proteins in the group. |
Shared Peptides | amino acid letters | Peptides that are shared between multiple proteins in the protein database(s) used for the search are listed. |
Number of Peptides | count | Number of unique+shared peptides observed that match to the specified protein group. |
Number of Unique Peptides | count | Number of unique proteins for the protein group. See Unique Peptides definition. |
Sequence Coverage Fraction | fraction | The fraction of amino acids in the protein observed in any peptide spectral match with a Q value <0.01. |
Sequence Coverage | amino acid letters | Displays amino acids in the protein observed in any peptide spectral match with a Q value <0.01 for each protein in the group, with the “|” character as the delimiter. Lowercase residues were not observed. Uppercase residues were observed. |
Sequence Coverage with Mods | amino acid letters | Displays amino acids, including post-translational modifications, in the protein observed in any peptide spectral match with a Q value <0.01 for each protein in the group, with the “|” character as the delimiter. Lowercase residues were not observed. Uppercase residues were observed. |
Fragment Sequence Coverage | amino acid letters | Amino acid sequence of the protein that can be matched with the fragment sequence. Lowercase residues were not observed. Uppercase residues were observed. |
Modification Info List | description | List of modifications identified |
Intensity_2601_Glyco_01-calib | arbitrary units | When simultaneously searching multiple raw files, MetaMorpheus outputs the quantified intensity of the given peak across all files with each file having its own column, named Intensity_”filename”. The column named “Intensity_X” corresponds to the peak intensity obtained from file X |
Intensity_2601_Glyco_02-calib | arbitrary units | When simultaneously searching multiple raw files, MetaMorpheus outputs the quantified intensity of the given peak across all files with each file having its own column, named Intensity_”filename”. The column named “Intensity_X” corresponds to the peak intensity obtained from file X |
Number of PSMs | count | The number of peptide spectral matches below with a Q-Value <0.01 observed for all peptides assigned to the protein group. |
Protein Decoy/Contaminant/Target | categorical | Each peptide spectral match, unique peptide and protein is assigned as decoy (D)/contaminant (C)/or target (T). The preference in assignment is D>C>T. |
Protein Cumulative Target | numeric | The protein group of all target proteins matching below the given Q-Value. |
Protein Cumulative Decoy | numeric | The protein group of all decoy proteins matching below the given Q-Value. |
Protein QValue | numeric | The possibility of getting a decoy protein from a given protein set. |
Best Peptide Score | numeric | The QValue Notch for the peptide in the protein group with the highest scoring peptide spectrum match. |
Best Peptide Notch QValue | numeric | The MetaMorpheus Score of the peptide in the protein group with the highest scoring peptide spectrum match. |
**Note: **Within the file, “N/A” indicates not available and was inserted for any cells with missing values in the Metamorpheus output.
PSMs Tabs:
Column Variable | Units | Description |
---|---|---|
File Name | name | The filename and path that contained the scan used in the identification. |
Scan Number | numeric | The scan number is specified in the header of each scan. The scan number reported usually contains the MS2 data used in the peptide spectral match. It is possible for multiple co-isolated peptides to be matched to the same scan number. |
Scan Retention Time | minutes | The experimental time that the scan was acquired. |
Num Experimental Peaks | count | The number of experimental peaks (post-peak trimming) in the MS2 scan. |
Total Ion Current | numeric | The total ion current of the MS2 spectrum. This is the sum of intensities from every MS2 peak. These intensities can come from fragmentation of multiple precursors depending on the selectivity for fragmentation (aka isolation width) and crowdedness of the MS1 spectrum. |
Precursor Scan Number | numeric | The scan number of the most recent MS1 scan. |
Precursor Charge | numeric | The charge of the isolated precursor peptide. |
Precursor MZ | numeric (ratio) | The mass to charge of the isolated precursor peptide. This is not necessarily the selected MZ for isolation. |
Precursor Mass | Daltons | The neutral (uncharged) mass of the peptide. |
Score | numeric | MetaMorpheus score is incremented by one for each matching b- and y-ion. The number after the decimal is the fraction of total peak intensity from the MS2 scan that can be assigned to the particular peptide spectral match. |
Delta Score | numeric | The MetaMorpheus score difference between the reported peptide and the next highest scoring peptide. If the next highest scoring peptide has the same score, both peptides are reported in the same row (ambiguity) and the next highest scoring peptide is used for the delta score. Thus, a delta score of 0 is not possible. |
Notch | numeric | A narrow mass window in which the value is an allowed mass difference between the experimentally observed peptide and the best matching theoretical peptide. This is an arbitrary number that signifies the notch’s category. |
Base Sequence | amino acid letters | The peptide amino acid sequence without modifications |
Full Sequence | amino acid letters | The complete peptide sequence containing all variable and localized modifications. |
Essential Sequence | amino acid letters | The full sequence containing only database-defined modifications and absent of fixed/variable modifications. |
Ambiguity Level | numeric level (1-5) | Ambiguity level as defined by Smith et. al (PMID: 31451767; PMCID: PMC6857706; DOI: 10.1038/s41592-019-0573-x). This classification was originally designed for proteoform identifications, but is equally effective at communicating ambiguity in peptide identifications. |
PSM Count (unambiguous, <0.01 q-value) | count | The number of peptide spectral matches below with a Q-Value <0.01 observed for all peptides assigned to the protein group. |
Mods | name | The name(s) of the modification(s) on the peptide. |
Mods Chemical Formulas | letters | The chemical formula(s) of the identified modification(s). |
Mods Combined Chemical Formula | letters | The aggregated chemical formula of all identified modification. |
Num Variable Mods | count | The number of variable modifications matched to a peptide. |
Missed Cleavages | count | The number of missed enzyme cleavages |
Peptide Monoisotopic Mass | Daltons | The mass of the peptide with the most abundant isotopes |
Mass Diff (Da) | Daltons | The absolute mass difference between the observed and theoretical precursor mass. (Calculated as observed-theoretical). |
Mass Diff (ppm) | ppm | The ppm mass difference between the observed and theoretical precursor mass. (Calculated as observed-theoretical). |
Protein Accession | alphanumeric | The accession number of the protein as specified in the protein database. |
Protein Name | name | The full name of the peptide’s parent protein. |
Gene Name | name | The gene name associated with the identified peptide’s parent protein. |
Organism Name | name | The database specified organism that the peptide’s parent protein originated from. |
Identified Sequence Variations | N/A | If the search was conducted using a database containing annotated sequence variants, this column displays the sequence variant that is identified by the PSM or peptide. |
Splice Sites | N/A | If the search was conducted using a database that contains annotated splice sites, this column contains splice sites which the PSM or peptide crossed. |
Contaminant | categorical | Specifies if the peptide’s parent protein is from a contaminant database, “Y”, or not “N” |
Decoy | categorical | Specifies if the peptide is a decoy peptide “Y”, or not “N”. |
Peptide Description | description | A brief statement regarding the peptide’s digestion. |
Start and End Residues In Protein | range | The one-based amino acid positions of the peptide in the parent protein(s). |
Previous Amino Acid | amino acid letters | The amino acid in the protein preceding the specified peptide. |
Next Amino Acid | amino acid letters | Amino acid in the protein that is next in line on the C-terminal end. |
Theoreticals Searched | count | The number of theoretical peptides searched against the spectrum. This is only reported if e-Value calculations are specified in the search task. |
Decoy/Contaminant/Target | categorical | Each peptide spectral match, unique peptide and protein is assigned as decoy (D)/contaminant (C)/or target (T). The preference in assignment is D>C>T. |
Matched Ion Series | alphanumeric | The found product ions and their respective charges. |
Matched Ion Mass-To-Charge Ratios | alphanumeric | The theoretical m/zs that were matched to the observed spectrum. |
Matched Ion Mass Diff (Da) | Daltons | The absolute mass differences between the observed and theoretical product ion masses. (Calculated as observed-theoretical). Order can be found in Matched Ion Series |
Matched Ion Mass Diff (Ppm) | ppm | The ppm mass differences between the observed and theoretical product ion masses. (Calculated as observed-theoretical). Order can be found in Matched Ion Series |
Matched Ion Intensities | alphanumeric | The observed intensities for the matched product ions. |
Matched Ion Counts | numeric | The number of product ions found for each series. |
Normalized Spectral Angle | numeric | Normalized Spectral Angle is used to measure the similarity between two spectra. Two identical spectra will have a spectral angle of 1, whereas two completely different spectra will have a spectral angle of 0. |
Localized Scores | N/A | If there is no ambiguity (only one peptide was assigned), then there is an attempt to localize the mass difference between the experimental and theoretical precursor masses. This mass difference is “placed” on each possible amino acid, and the resulting peptide score is calculated and reported in this column. Each reported score represents an amino acid (N-to-C) that the mass difference was “localized” to. |
Improvement Possible | N/A | The increase in the MetaMorpheus score produced by localization of the modification to the position specified in the Full Sequence. |
Cumulative Target | numeric rank | The target/decoy approach for determination of FDR yields lists of peptides and proteins matching either target or decoy. These commingled lists are sorted by score. The top scoring target match is labeled as 1. Each additional match to target is incremented by one. The total count of target matches scoring at or above a particular score at any point in the list is reported as Cumulative Target. Cumulative Decoy divided by Cumulative Target is the FDR. |
Cumulative Decoy | numeric rank | The target/decoy approach for determination of FDR yields lists of peptides and proteins matching either target or decoy. These commingled lists are sorted by score. The top scoring decoy match is labeled as 1. Each additional match to decoy is incremented by one. The total count of decoy matches scoring at or above a particular score at any point in the list is reported as Cumulative Decoy. Cumulative Decoy divided by Cumulative Target is the FDR. |
QValue | numeric | The q-value for the identification, calculated as the number of cumulative decoys (false positives) divided by the number of cumulative targets (true positives). |
Cumulative Target Notch | count | The cumulative number of targets specific to the specified notch |
Cumulative Decoy Notch | count | The cumulative number of decoys specific to the specified notch. |
QValue Notch | numeric | The notch specific q-value for the identification, calculated as the number of cumulative notch decoys (false positives) divided by the number of cumulative notch targets (true positives). |
PEP | numeric | Posterior Error Probabilities are calculated by a percolator-like gradient-boosted binary decision tree (PMID: 33683901; PMCID: PMC8377504; DOI: 10.1021/acs.jproteome.0c00838). This value represents the probability that a given spectral match is incorrect. |
PEP_QValue | numeric | A traditional Q-Value calculation with results ranked by ascending PEP, as opposed to the Q-Value column which is ranked by descending MetaMorpheus score. This value represents the probability that a given result is wrong, in the set containing all matches with a PEP value less than or equal to the given result. |
**Note: **Within the file, “N/A” indicates not available and was inserted for any cells with missing values in the Metamorpheus output.
Quantified Peptides Tabs:
Column Variable | Units | Description |
---|---|---|
Sequence | amino acid letters | Individual peptide sequences |
Base Sequence | amino acid letters | The peptide amino acid sequence without modifications |
Protein Groups | name | The name of the protein groups of identified peptides |
Gene Names | name | The gene name associated with the identified peptide’s parent protein. |
Organism | name | The database specified organism that the peptide’s parent protein originated from. |
Intensity_2601_Glyco_01-calib | arbitrary units | When simultaneously searching multiple raw files, MetaMorpheus outputs the quantified intensity of the given peak across all files with each file having its own column, named Intensity_”filename”. The column named “Intensity_X” corresponds to the peak intensity obtained from file X. |
Intensity_2601_Glyco_02-calib | arbitrary units | When simultaneously searching multiple raw files, MetaMorpheus outputs the quantified intensity of the given peak across all files with each file having its own column, named Intensity_”filename”. The column named “Intensity_X” corresponds to the peak intensity obtained from file X. |
Detection Type_2601_Glyco_01-calib | name | When simultaneously searching multiple raw files, MetaMorpheus outputs the detection type of the given peak across all files with each file having its own column, named Detection_”filename”. The column named “Detection_X” corresponds to the detection method for the peak obtained from file X. |
Detection Type_2601_Glyco_02-calib | name | When simultaneously searching multiple raw files, MetaMorpheus outputs the detection type of the given peak across all files with each file having its own column, named Detection_”filename”. The column named “Detection_X” corresponds to the detection method for the peak obtained from file X. |
**Note: **Within the file, “N/A” indicates not available and was inserted for any cells with missing values in the Metamorpheus output.
Quantified Peaks Tabs:
Column Variable | Units | Description |
---|---|---|
File Name | N/A | The filename and path that contained the scan used in the identification. |
Base Sequence | amino acid letters | Unmodified amino acid sequence of identified peptide |
Full Sequence | amino acid letters | The complete peptide sequence containing all variable and localized modifications. |
Protein Group | name | The name of the protein groups of identified peptides |
Peptide Monoisotopic Mass | Daltons | The mass of the peptide calculated from atoms in their most abundant isotopic form (12C, 16O, 14N, etc.). This is the uncharged (neutral) mass. |
MS2 Retention Time | minutes | The retention time at which the MS2 scan for a given peak was initiated/obtained. |
Precursor Charge | charge value | The charge of the isolated precursor peptide. |
Theoretical MZ | number | Mass obtained from the precursor monoisotopic mass divided by its deconvoluted charge. |
Peak intensity | numeric | Intensity of the MS1 peak. The sum of intensity from each peak of the apex isotopic envelope. |
Peak RT Start | minutes | The retention time at which a peak is first detected. |
Peak RT Apex | minutes | The retention time at which a peak is most intense. |
Peak RT End | minutes | The retention time at which a peak is last detected. |
Peak MZ | numeric | The measured MS1 m/z for the peak. |
Peak Charge | numeric (charge) | The measured charge at the peak. |
Num Charge States Observed | count | The number of unique charge states a precursor peptide was found to exist as. Only MS1 evidence is required for an observation. |
Peak Detection Type | name | Describes how a peak was detected for quantification. Typically, MSMS and MBR (match between runs). |
MBR Score | N/A | The measurement to evaluate the MBR identification, calculated by getting the geometric mean four factors (ex. The distribution similarity between the anchor and donor peptide on their retention time, mass error, matched scan and intensity). The score from 1 to 100, higher scores are better. |
PSMs Mapped | numeric | Number of MS2 PSMs that mapped to the precursor peptide. This correlates to the number of peptide fragmentation events seen for the given precursor. |
Base Sequences Mapped | numeric | The base sequence of the protein that was matched with the fragments in the experiment. |
Full Sequences Mapped | numeric | The full sequence of the protein that was matched with the fragments in the experiment. |
Peak Split Valley RT | numeric | The retention time of the valley separating the current peak from the next closest peak. |
Peak Apex Mass Error (ppm) | ppm | The difference between the theoretical mass and the precursor monoisotopic mass. |
**Note: **Within the file, “N/A” indicates not available and was inserted for any cells with missing values in the Metamorpheus output.
Code/software
RAW data files were analyzed with the MetaMorpheus software version 1.0.1 developed by the Smith laboratory (68). For hT40 proteins, the following FASTA files were downloaded from Uniprot (November 2021) and used for analysis: Escherichia coli (strain K12) (UP000000625), Asp-N (Q9R4J4), Lys-C (Q02SZ7), and full-length tau (2N4R isoform, P10636-8). The same FASTA files were used to analyze the hT39 proteins except full-length tau (2N4R isoform, P10636-8) was replaced with 2N3R tau isoform (P10636-5). A mass shift of +203.079 Da (C8H13NO5) was used to search for O-GlcNAc modifications (69) on S and T. In addition, the following mass-to-charge-ratios (m/z) corresponding to diagnostic ions (DIs) were investigated: +126.055 Da (C6H7NO2), +138.055 Da (C7H7NO2), +144.066 Da (C6H9NO3), +168.066 Da (C8H9NO3), +186.076 Da (C8H11NO4), and +204.087 Da (C8H13NO5) (69).
The analysis sequence included mass calibration, global post-translational modification discovery (G-PTM-D) (70), and a classic search. Mass calibration was conducted using the following criteria: protease = Asp-N/Lys-C; maximum missed cleavages = 2; minimum peptide length = 7; maximum peptide length = unspecified; initiator methionine behavior = Variable; variable modifications = Oxidation on M; max mods per peptide = 2; max modification isoforms = 1024; precursor mass tolerance = ±15.0000 PPM; product mass tolerance = ±25.0000 PPM. The criteria utilized for G-PTM-D were protease = Asp-N/Lys-C; maximum missed cleavages = 2; minimum peptide length = 7; maximum peptide length = unspecified; initiator methionine behavior = Variable; max modification isoforms = 1024; variable modifications = Oxidation on M; G-PTM-D modifications count = 3; precursor mass tolerance(s) = ±5.0000 PPM around 0, 203.079372521 Da; product mass tolerance = ±20.0000 PPM. Finally, a classic search was conducted using the following criteria: protease = Asp-N/Lys-C; search for truncated proteins and proteolysis products = False; maximum missed cleavages = 2; minimum peptide length = 7; maximum peptide length = unspecified; initiator methionine behavior = Variable; variable modifications = Oxidation on M; precursor mass tolerance = ±5.0000 PPM; product mass tolerance = ±20.0000 PPM; report PSM ambiguity = True. Peptides were quantified through the FlashLFQ method for label-free quantification bundled into MetaMorpheus (71). At least two peptides were required to identify the protein. Sites of O-GlcNAc modification on tau detected at a false discovery rate (calculated using the target-decoy approach) of 1% are reported (Supplementary Table S1). Supplementary Table S2 demonstrates all quantified tau peptides in unmodified vs GlcNAc-modified tau samples. Supplementary Table S3 shows the quantified peaks of tau with their corresponding peptide masses, theoretical and observed m/z, retention time, and peptide spectral matches (PSMs). MetaDraw version 1.0.5 was utilized to review the PSMs of modified and unmodified tau peptides (samples of these peptides are included in Figures S1 and S2). Processed proteomics data on tau peptides are available in the linked manuscript (Supplementary Tables 1-3 and Supporting information) and full proteomics data sets and .RAW files from mass spectrometry are available here.
Access information
Other publicly accessible locations of the data:
- N/A
Data was derived from the following sources:
- N/A
Preparation of recombinant unmodified and Glc tau proteins
Recombinant tau proteins were prepared by co-transforming BL21 bacteria (NEB, #C2527H) with two plasmids: a plasmid expressing tau under the T7 promoter as described previously (64) and the pHis-OGT plasmid. The purification procedure for Glc tau proteins was performed using 2 L terrific broth (TB) cultures grown in the presence of ampicillin (50 μg/ml) and kanamycin (25 μg/ml) as selection markers. Moreover, TB was supplemented with GlcNAc (2 mM; Sigma, #U4375) and PUGNAC (10 μM; Sigma, #A7229) to enrich GlcNAc and inhibit the activity of O-GlcNAcase enzyme, respectively. Unmodified tau proteins were grown in the same way with the exception that kanamycin, GlcNAc, and PUGNAC were excluded from TB. Bacterial pellets were lysed using 0.5 M NaCl, 10 mM Tris, and 5 mM Imidazole, pH 8 in the presence of protease inhibitors (as described previously in (64)) and PUGNAC (10 μM) at weight: volume ratio of 1:5. PUGNAC was not included in lysing the bacterial pellets for unmodified tau proteins. The bacterial lysate was subjected to centrifugation at 107,377 RCF for 45 min at 4 °C using a Type 70 Ti rotor (Beckman Coulter, #337922). Supernatant was collected, then the residual bacterial pellet was further lysed in RIPA buffer (10 ml; CST, #9806) supplemented with the same inhibitors by sonicating for 4 times, 30 seconds each. Another centrifugation step was performed to collect the supernatant extracted with the RIPA buffer, followed by pooling the two supernatants (lysis buffer and RIPA buffer) together for further purification. The rest of the purification procedure was performed as described previously (64). Briefly, three stages of fast protein liquid chromatography were performed: heavy metal affinity chromatography using a 5 ml HiTrap Talon crude column (Cytiva, #28953767); size exclusion chromatography using HiPrep 16/60 Sephacryl S-500 HR (Cytiva, #28935606); anion exchange chromatography using 5 ml HiTrap Q HP (Cytiva, #17115401). The elution fractions containing the highly purified monomeric tau were concentrated to 2-4 mg/ml and supplemented with 1 mM DTT. The final unmodified and Glc tau proteins were aliquoted and frozen at -80 °C. The final concentration of recombinant tau proteins was determined using the SDS-Lowry method as described previously (64).
Recombinant tau protein preparation for tandem mass spectrometry (MS)
Unmodified and Glc tau proteins were digested using a combination of Asp-N (Promega, #V V1621) and rLysC (Promega, #V167A). First, each recombinant tau protein sample (10 µg, n = 1) was subjected to 5 rounds of buffer exchange with 25 mM ammonium bicarbonate (AmBic) pH 8 using 0.5 ml Amicon filter with 3K MWCO (15,000 RCF for 10 minutes; Millipore, #UFC500396). Then, recombinant tau proteins were recovered from the filter by centrifugation at 15,000 RCF for 2 minutes and vacuum dried using Vacufuge. The dried pellets of recombinant tau proteins were reconstituted in 50 ml of digestion buffer (12.5 mM AmBic pH 8 + acetonitrile (ACN) 50%) and incubated at 37 °C for 16-18 hours with Asp-N (150 ng of enzyme). The following day, digested protein samples were subjected to vacuum drying and stored at 4 °C until the second digestion was initiated. Lys-C (500 ng of enzyme) was added and incubated at 37 °C for 16-18 hours. The following day, digested protein samples were subjected to vacuum drying and stored at -20 °C until running on the MS.
Tandem MS of recombinant tau proteins
We utilized an approach like that described by Yang et al. (67). MS analysis was performed twice: initially for method development of recombinant Glc tau and subsequently to validate the final protein preparations used for experiments. The Vanquish Neo nanoHPLC system interfaced to a Thermo Scientific Orbitrap Eclipse MS (Thermo Fisher Scientific) was used for analysis. For each sample, 1 μg was injected and desalted with an Accalaim™ PepMap™ C18 Nano trap column (3 μm, 100 Å, 75 μm × 2 cm) in 100% Buffer A (0.1% formic acid in HPLC water) at 3 μl/min for 5 min. Samples were separated in a linear gradient of 5–35% Buffer B (80% ACN and 0.1% formic acid) over 105 min and washing at 90% Buffer B for 12 min using an Easy Spray PepMap™ RSLC C18 nano column (2 μm, 100 Å, 75 μm × 250 mm). Before each injection, the column was equilibrated at 1.0 % Buffer B for 5 min. Mass spectra were collected using data dependent MS analysis with a duty cycle of 2 sec. To collect precursor masses, orbitrap [resolution (R) of 120,000 at 200 m/z] with internal calibration was used. For precursors carrying charges between 2 and 8 and with intensities over 5 × 104 at R = 30,000, stepped HCD spectra at HCD energies of 15, 25, and 35% were acquired with dynamic exclusion of 15 sec. The fragments are monitored for GlcNAc oxonium ions at m/z of 138.0545, 204.0867, 366.1396, 126.005, 144.0655, 168.0654, 186.076, 274.0921, and 292.1027 Da. If at least one GlcNAc oxonium ion was detected with 15 ppm mass accuracy, the corresponding precursor ion was used to collect an EThcD spectrum in the orbitrap at R of 30,000. For charges of 2 and 3, ETD target was 5.0 × 105; for charges of 4 to 8, ETD target was 2.0 × 105. Supplemental collision energy at 15% was also included. Reaction time of ETD was variable according to the precursor charge state. For a charge of 2, ETD reaction time was 125 msec; for a charge of 3, ETD reaction time was 100 msec; for a charge of 4, ETD reaction time was 75 msec; for charges ≥5, ETD reaction time was 50 msec.
MS data analysis to determine O-GlcNAc modification sites
RAW data files were analyzed with the MetaMorpheus software version 1.0.1 developed by the Smith laboratory (68). For hT40 proteins, the following FASTA files were downloaded from Uniprot (November 2021) and used for analysis: Escherichia coli (strain K12) (UP000000625), Asp-N (Q9R4J4), Lys-C (Q02SZ7), and full-length tau (2N4R isoform, P10636-8). The same FASTA files were used to analyze the hT39 proteins except full-length tau (2N4R isoform, P10636-8) was replaced with 2N3R tau isoform (P10636-5). A mass shift of +203.079 Da (C8H13NO5) was used to search for O-GlcNAc modifications (69) on S and T. In addition, the following mass-to-charge-ratios (m/z) corresponding to diagnostic ions (DIs) were investigated: +126.055 Da (C6H7NO2), +138.055 Da (C7H7NO2), +144.066 Da (C6H9NO3), +168.066 Da (C8H9NO3), +186.076 Da (C8H11NO4), and +204.087 Da (C8H13NO5) (69).
The analysis sequence included mass calibration, global post-translational modification discovery (G-PTM-D) (70), and a classic search. Mass calibration was conducted using the following criteria: protease = Asp-N/Lys-C; maximum missed cleavages = 2; minimum peptide length = 7; maximum peptide length = unspecified; initiator methionine behavior = Variable; variable modifications = Oxidation on M; max mods per peptide = 2; max modification isoforms = 1024; precursor mass tolerance = ±15.0000 PPM; product mass tolerance = ±25.0000 PPM. The criteria utilized for G-PTM-D were protease = Asp-N/Lys-C; maximum missed cleavages = 2; minimum peptide length = 7; maximum peptide length = unspecified; initiator methionine behavior = Variable; max modification isoforms = 1024; variable modifications = Oxidation on M; G-PTM-D modifications count = 3; precursor mass tolerance(s) = ±5.0000 PPM around 0, 203.079372521 Da; product mass tolerance = ±20.0000 PPM. Finally, a classic search was conducted using the following criteria: protease = Asp-N/Lys-C; search for truncated proteins and proteolysis products = False; maximum missed cleavages = 2; minimum peptide length = 7; maximum peptide length = unspecified; initiator methionine behavior = Variable; variable modifications = Oxidation on M; precursor mass tolerance = ±5.0000 PPM; product mass tolerance = ±20.0000 PPM; report PSM ambiguity = True. Peptides were quantified through the FlashLFQ method for label-free quantification bundled into MetaMorpheus (71). At least two peptides were required to identify the protein. Sites of O-GlcNAc modification on tau detected at a false discovery rate (calculated using the target-decoy approach) of 1% are reported (Supplementary Table S1). Supplementary Table S2 demonstrates all quantified tau peptides in unmodified vs GlcNAc-modified tau samples. Supplementary Table S3 shows the quantified peaks of tau with their corresponding peptide masses, theoretical and observed m/z, retention time, and peptide spectral matches (PSMs). MetaDraw version 1.0.5 was utilized to review the PSMs of modified and unmodified tau peptides (samples of these peptides are included in Figures S1 and S2). Processed proteomics data on tau peptides are available in this manuscript (Supplementary Tables 1-3 and Supporting Information).