Skip to main content

Data for: Oligonucleotide mapping via mass spectrometry to enable comprehensive primary structure characterization of an mRNA vaccine against SARS-CoV-2

Cite this dataset

Gau, Brian (2023). Data for: Oligonucleotide mapping via mass spectrometry to enable comprehensive primary structure characterization of an mRNA vaccine against SARS-CoV-2 [Dataset]. Dryad.


Oligonucleotide mapping via liquid chromatography mass spectrometry mass spectrometry (LC-MS/MS) was recently developed to support development of Comirnaty®, the world’s first commercial mRNA vaccine which immunizes against the SARS-CoV-2 virus. Analogous to peptide mapping of therapeutic protein modalities, oligonucleotide mapping described here provides direct primary structure characterization of mRNA, through enzymatic digestion, accurate mass determinations, and optimized collisionally-induced fragmentation. Sample preparation for oligonucleotide mapping is a rapid, one-pot, one-enzyme digestion. The digest is analyzed via LC-MS/MS with an extended gradient and resulting data analysis employs semi-automated software. In a single method, oligonucleotide mapping readouts include a highly reproducible and completely annotated UV chromatogram with >98% sequence coverage and a microheterogeneity assessment of 5´ terminus capping and 3´ terminus poly(A) tail length. Oligonucleotide mapping was pivotal to ensure the quality, safety, and efficacy of mRNA vaccines by providing: confirmation of construct identity and primary structure and assessment of product comparability following manufacturing process changes. More broadly, this technique may be used to directly interrogate the primary structure of RNA molecules in general.


Oligonucleotide mapping was developed with a representative batch of Comirnaty® BNT162b2 Original DS (i.e., the original Pfizer-BioNTech COVID-19 vaccine that encodes for the spike glycoprotein (S) of the SARS-CoV-2 virus, the Wuhan-Hu-1 isolate: GenBank: QHD43416.1), and it has been applied to subsequent Comirnaty® BNT162b2 constructs (BNT162b2s04 [Delta] and BNT162b2s05 [Omicron]) and other portfolio mRNA molecules. Fifty micrograms of mRNA DS was digested with 2500 U of RNase T1 in a 50 mM Tris(hydroxymethyl)aminomethane (Tris) pH 7.5 buffer with 20 mM Ethylenediaminetetraacetic acid (EDTA) 1 h at 37°C. The resulting enzymatic fragment solution was spiked with 10× triethylamine (TEA) and 1,1,1,3,3,3-hexafluoro-2-propanol (HFIP) emulsion to give a final v/v concentration of 0.1% TEA 1% HFIP. A 4 µg load was injected and fragments were separated by ion-pair reversed-phase ultrahigh performance liquid chromatography (IP RP-UHPLC) with UV detection at 260 nm using a 1290 Infinity II Bio LC System (Agilent) paired with an ACQUITY Premier Oligonucleotide C18 column: 130Å, 1.7 µm, 2.1 x 150 mm (Waters). Each mobile phase contained 0.1% TEA and 1% HFIP. The TEA functions as the ion-pairing agent, and the HFIP provides MS-compatible buffering as a volatile weak acid. The gradient progressed from 1% to 17% mobile phase B (50% methanol) in 195 min, then 17% to 35% B in 60 min, followed by wash and equilibration segments. The flow rate was 0.2 mL/min with a post-column split: 50 µL/min to the UV diode array detector, and 150 µL/min to an Orbitrap Eclipse Tribrid Mass Spectrometer (Thermo Fisher Scientific). The on-line electrospray ionization (ESI) MS acquisition was done in negative ion mode with a spray voltage of 2700 V. MS scans were from 400 to 2000 m/z at 120000 resolving power (RP) at 400 m/z. Tandem mass spectrometry (MS/MS) was accomplished at 30000 RP by a 17, 21, 25 stepped higher-energy collisional dissociation (HCD) of multiply charged precursor candidates selected by the data-dependent acquisition (DDA) algorithm.

Biopharma Finder version 5.0 software (Thermo Fisher Scientific) was used to identify oligonucleotides based on both MS and MS/MS matches to theoretical RNase T1 digest products. An MS match required the observed oligonucleotide neutral mass to be within 5 ppm of the theoretical mass. An MS/MS match required that all major fragments were identified and that the complete sequence could be inferred from fragment ions containing the 5′ or 3′ ends (not internal fragments). To ensure that the automated software employed stringent MS/MS matching, the software also searched against theoretical RNase T1 digests of decoy constructs having random arrangements of the same composition of nucleotides as the mRNA molecule. To augment the list of automated software identifications, Excel Visual Basic for Applications (VBA) scripts were employed to examine unidentified LC/UV features and underlying mass spectra one-by-one. In the case of the 73-mer R1062 and its related poly(A)-tail species, identifications were made using a deconvolved, zero-charge mass spectrum, without MS/MS.

Usage notes

The Thermo mass spectrometer .raw files can be opened by Xcalibur Qual Browser (Thermo Fisher Scientific), which is (proprietary) commercial software. An open-source alternative is ThermoRawFileParser (

Automated identification of oligonucleotides was accomplished using BioPharma Finder v5.0 (Thermo Fisher Scientific), which is (propriety) commercial software. An open-source alternative is attached in this dataset: three Excel VBA spreadsheets that may be used to in the identification of any MS feature. As Excel VBA spreadsheets, they are fully compiled and do not require any library code extensions or external links.

The "Oligonucleotide MS Peak ID Given Sequence v8.xlsm" spreadsheet may be used to generate a list of theoretical RNaseT1 digest oligonucleotides and match observed precursor masses to candidate theoretical RNaseT1 digest oligonucleotides. Inputs are a single or list of observed masses and a target mRNA construct sequence.

The "Oligonucleotide MS2 Spectrum Matcher v11.xlsm" spreadsheet may be used to check a candidate oligonucleotide sequence with a single MS/MS spectrum. Inputs are the candidate sequence and the m/z vs intensity "XY" coordinate list.

The "Oligonucleotide Composition from Mass Calculator v2.xlsm" may be used to generate a list of possible nucleotide compositions given an input mass.

A 4th Excel VBA spreadsheet is also provided, "Oligonucleotide Mapping UV Annotation Tool v11.xlsm", to facilitate the annotation of the LC-UV chromatogram with LC-MS/MS-identified oligonucleotides in a PowerPoint file in which the chromatogram is plotted with annotations placed as separate, overlayed text objects in the .pptx. Inputs to this spreadsheet include a Qual Browser LC/UV peak list, LC/UV chromatogram time vs intensity (XY) coordinate list, the mRNA construct sequence, and the BioPharma Finder-exported component table.

The use of these spreadsheets and details of the entire method are provided in Oligonucleotide Mapping and Data Analysis Protocol.pdf


Pfizer (United States)