Profiling of linear B-cell epitopes against human coronaviruses in pooled sera sampled early in the COVID-19 pandemic
Data files
Mar 12, 2024 version files 9.07 MB
Abstract
Background: Antibodies play a key role in the immune defence against infectious pathogens. Understanding the underlying process of B cell recognition is not only of fundamental interest; it supports important applications within diagnostics and therapeutics. Whereas the nature of conformational B cell epitope recognition is inherently complicated, linear B cell epitopes offer a straightforward approach that potentially can be reduced to one of peptide recognition.
Methods: Using an overlapping peptide approach representing the entire proteomes of the seven main coronaviruses known to infect humans, we analysed sera pooled from eight PCR-confirmed COVID-19 convalescents and eight pre-pandemic controls. Using a high-density peptide microarray platform, 13-mer peptides overlapping by 11 amino acids were in situ synthesised and incubated with the pooled primary serum samples, followed by development with secondary fluorochrome-labelled anti-IgG and -IgA antibodies. Interactions were detected by fluorescence detection. Strong Ig interactions encompassing consecutive peptides were considered to represent "high-fidelity regions" (HFRs). These were mapped to the coronavirus proteomes using a 60% homology threshold for clustering.
Results: We identified 333 human coronavirus derived HFRs. Among these, 98 (29%) mapped to SARS-CoV-2, 144 (44%) mapped to one or more of the four circulating common cold coronaviruses (CCC), and 54 (16%) cross-mapped to both SARS-CoV-2 and CCCs. The remaining 37 (11%) mapped to either SARS-CoV or MERS-CoV. Notably, the COVID-19 serum was skewed towards recognising SARS-CoV-2-mapped HFRs, whereas the pre-pandemic was skewed towards recognising CCC-mapped HFRs. In terms of absolute numbers of linear B cell epitopes, the primary targets are the ORF1ab protein (60%), the spike protein (21%), and the nucleoprotein (15%) in that order; however, in terms of epitope density, the order would be reversed.
Conclusion: We identified linear B cell epitopes across coronaviruses, highlighting pan-, alpha-, beta-, or SARS-CoV-2-corona-specific B cell recognition patterns. These findings could be pivotal in deciphering past and current exposures to epidemic and endemic coronavirus. Moreover, our results suggest that pre-pandemic anti-CCC antibodies may cross-react against SARS-CoV-2, which could explain the highly variable outcome of COVID-19. Finally, the methodology used here offers a rapid and comprehensive approach to high-resolution linear B-cell epitope mapping, which could be vital for future studies of emerging infectious diseases.
README: Data from: Profiling of linear B-cell epitopes against human coronaviruses in pooled sera sampled early in the COVID-19 pandemic
https://doi.org/10.5061/dryad.s1rn8pkg7
The study used high-density peptide microarrays to investigate linear B cell epitopes in the seven human coronaviruses, human cytomegalovirus (strain AD169), and Zaire Ebola virus (strain Mayinga-76). Two serum pools derived from (1) eight convalescent PCR-confirmed COVID-19 patients and (2) eight individuals sampled prior to the COVID-19 pandemic (pre-pandemic) were analysed for seroreactivity to peptides derived from the nine viruses represented on the microarrays. Antibody responses from the pandemic serum pool serum were skewed towards recognising SARS-CoV-2-derived peptides, whereas the pre-pandemic serum pool was skewed towards recognising the common cold coronavirus (CCC) derived peptides. The study identified 333 human coronavirus derived linear B cell epitopes, where 98 (29%) mapped to SARS-CoV-2, 144 (44%) mapped to the CCCs, and 54 (16%) cross-mapped to SARS-CoV-2 and on or more of the CCCs. The remaining 37 (11%) mapped exclusively to either SARS-CoV or MERS-CoV.
The fluorescence intensity values indicating the amount of bound serum antibody for each peptide were extracted from the scanned array images using Schafer-N proprietary software.
Description of the data and file structure
Aggregated microarray data for all samples is microarray_data_aggregated.txt
Fluorescence intensity values for each peptide on the microarray are found in an aggregated tab-separated file format. The first column contains the synthesised peptide sequences, and the second column contains the peptide group:
- Test: Peptides derived from any one of the 9 virus strains
- Random: The random peptides
- The remaining four columns contain the fluorescence intensity values extracted for the COVID-19 convalescent (pandemic) serum pool IgA and IgG and the pre-pandemic serum pool IgA and IgG. Peptides with missing values due to artefacts etc. are encoded as "-1".
Peptide to protein mapper file is peptide_map.txt
The peptide parent protein names and their locations are in a tab-separated file format. The first column contains the synthesised peptide sequences, the second column contains the abbreviated organism name, the third column is the UniProt ID of their parent proteins, the fourth column is the abbreviated protein name, the fifth column is the length of the parent proteins, the sixth and seventh columns are the start and end coordinates of the peptides in their parent protein.
Combining files
microarray_data_aggregated.txt and peptide_map.txt can be combined using the "coresequence" column, representing the peptide sequences.
Sharing/Access information
All available data are accesible through: https://doi.org/10.5061/dryad.s1rn8pkg7
Methods
Peptide microarray design
Peptide microarrays were designed using the proteomes of the seven human coronaviruses (HcoVs):
· HCoV-229E: 7 proteins
· HCoV-HKU1(N1): 8 proteins
· HCoV-NL63: 6 proteins
· HCoV-OC43: 9 proteins
· MERS-CoV: 9 proteins
· SARS-CoV: 14 proteins
· SARS-CoV-2: 13 proteins
The open reading frame (ORF) 1a was excluded from all the HCoV proteomes since the ORF1ab covered these sequences. As a positive control pathogen, the entire proteome of the human cytomegalovirus (HCMV, strain AD169), consisting of 190 proteins, was included together with the entire proteome of the Zaire Ebola virus (strain Mayinga-76, EBOZM), consisting of 9 proteins. These 265 protein sequences were represented as 13 amino acid long peptides, overlapping by 11 amino acids and tiling by 2 amino acids. Leading to total of 66581 non-redundant virus-derived peptide sequences. As a source of background-binding control, 3900 non-overlapping 13 amino acid peptide sequences were generated in silico using the amino acid frequencies from the 265 virus-derived proteins.
Peptide microarray synthesis and probing
The 66581 virus-derived peptide sequences in triplicate and the 3900 random background-binding peptides in duplicate were distributed randomly across 12 virtual sectors using proprietary software (PepArray, Schafer-N). Peptides were synthesised by Schafer-N (Copenhagen) on amino-functionalized glass microscope slides using a maskless photolithographic light-directed solid-phase peptide synthesis. Peptide microarrays were incubated with convalescent COVID-19 or pre-pandemic sera diluted 1:100 in PBS (0.1% BSA, 0.1% Triton X-100) for 2 hours at room temperature, followed by washing and development using 1 µg/mL Cy3 or Cy5-labelled secondary antibodies against human IgG or IgA, respectively. After washing, microarrays were dried and scanned on a microarray laser scanner (INNOSCAN 900, Innopsys, France) and then quantified at an 8-bit resolution and purged of artefacts using proprietary PepArray software.
Usage notes
Aggregated microarray data for all samples:
microarray_data_aggregated.txt
Fluorescence intensity values for each peptide on the microarray are found in an aggregated tab-separated file format. The first column contains the synthesised peptide sequences, and the second column contains the peptide group:
· Test: Peptides derived from any one of the 9 virus strains
· Random: The random peptides
The remaining four columns contain the fluorescence intensity values extracted for the COVID-19 convalescent (pandemic) serum pool IgA and IgG and the pre-pandemic serum pool IgA and IgG.
Peptide to protein mapper file:
peptide_map.txt
The peptide parent protein names and their locations are in a tab-separated file format. The first column contains the synthesised peptide sequences, the second column contains the organism name, the third column is the UniProt ID of their parent proteins, the fourth column is the abbreviated protein name, the fifth column is the length of the parent proteins, the sixth and seventh columns are the start and end coordinates of the peptides in their parent protein.
Combining files:
microarray_data_aggregated.txt and peptide_map.txt can be combined using the "coresequence" column, representing the peptide sequences.