Identification of reactive Borrelia burgdorferi peptides associated with Lyme disease
Data files
Dec 23, 2025 version files 192.22 MB
-
mBio.datasubmission.20240906.data.txt
192.20 MB
-
mBio.datasubmission.20240906.meta.txt
16.49 KB
-
README.md
4.03 KB
Abstract
Borrelia burgdorferi, the agent of Lyme disease, is estimated to cause >400,000 annual infections in the United States. Serology is the primary laboratory method to support the diagnosis of Lyme disease, but current methods have intrinsic limitations that require alternative approaches or targets. We used a high-density peptide array that contains >90,000 short overlapping peptides to catalogue immunoreactive linear epitopes from >60 primary antigens of B. burgdorferi. We then pursued a machine learning approach to identify immunoreactive peptide panels that provide optimal Lyme disease serodiagnosis and can differentiate antibody responses at various stages of disease. We examined 226 serum samples from the Lyme Biobank and the National Institutes of Health that included sera from 110 individuals diagnosed with Lyme disease, 31 probable cases from symptomatic individuals, and 85 healthy controls. Cases were grouped based on disease stage and presentation and included individuals with early localized, early disseminated, and late Lyme disease. We identified a peptide panel originating from 14 different epitopes that differentiated cases versus controls, whereas another peptide panel built from 12 unique epitopes differentiated subjects with various disease manifestations. Our method demonstrated an improvement in B. burgdorferi antibody detection over the current two-tiered testing approach and confirmed the key diagnostic role of VlsE and FlaB antigens at all stages of Lyme disease. We also uncovered epitopes that triggered a temporal antibody response that was useful for differentiation of early and late disease. Our findings can be used to streamline serologic targets and improve antibody-based diagnosis of Lyme disease.
https://doi.org/10.5061/dryad.8cz8w9h0c
Description of the data and file structure
The Tick-Borne Disease Serochip (TBD-Serochip) is a slide-based peptide array used to catalog antibody responses to tick-borne pathogens. For each antigen selected for inclusion on the array, all protein sequences available as of October 2016 were downloaded from the NCBI protein database, aligned, and used to design 12-mer peptides that tile each protein with an 11-aa overlap to the preceding peptide in a sliding window pattern. Our prototype version of the TBD-Serochip included approximately 170,000 12-mer peptides per subarray and contained 12-mer peptides designed from antigenic sequences of eight tick-borne pathogens present in North America. For B. burgdorferi, this included 62 different antigens (including all paralogs) that are known to elicit an antibody response in humans (Fig. S1) . For each antigen, we included the sequence of every genetic variant in the database for the 12-mer design. This included 12-mer peptides for 20 distinct OspC types, and a wide range of recombinant sequences for VlsE. This approach enables the identification of all reactive portions for every examined antigen and demonstrates the impact of amino acid (aa) variation within a given epitope on antibody binding. Conversely, it can also inflate the number of significant reactive peptides due to cross-reactivity between different variants of the same 12-mer fragment. The B. burgdorferi peptide component of the TBD-Serochip consisted of 91,338 peptides. The arrays were manufactured by Nimble Therapeutics. (Cited from the manuscript)
Files and variables
We have submitted our raw data (mBio.datasubmission.20240906.data.txt) and our metadata (mBio.datasubmission.20240906.meta.txt) files.
File1 (Data File): mBio.datasubmission.20240906.data.txt
Description: The raw signal table from the peptide array. The columns are samples, and the rows are individual peptides.
- PROBE_DESIGN_ID: Elements in this row indicate the probe ID and whether the fluorescence signals of IgG or IgM antibodies are being measured for each probe sequence.
- PROBE_SEQUENCE: The individual peptide sequence that was being used for detection.
- All column names after "PROBE_SEQUENCE" are the sample names. The numerical values in each row correspond to fluorescence signal data based on the individual peptides in the "PROBE_SEQUENCE" column. The units are Relative Fluorescence Units (RFUs).
File2 (Supplemental File): mBio.datasubmission.20240906.meta.txt
Description: The meta file used for File1, with necessary meta/group information for all samples used in the study.
- SampleID: The unique identifier of each sample.
- Array: The numerical location on the chip that each sample was retrieved from.
- Group: The category that each sample belongs in. These names can be cross-referenced in the manuscript. The groups are as follows:
- LDB Control: Lyme disease Biobank healthy controls
- Confirmed Acute Lyme SEM-A: Laboratory confirmed acute Lyme disease with a single erythema migrans
- Confirmed Acute Lyme: Laboratory confirmed acute Lyme disease without an erythema migrans
- Probable Lyme: Probable Lyme disease, erythema migrans present, no laboratory confirmation
- NIH Control: National Institutes of Health healthy controls
- LA: Lyme arthritis
- MEM: Multiple erythema migrans
- ALNB: acute Lyme neuroborreliosis
- SEM-C: Single erythema migrans, convalescent
- Sample_ID_CII: Indicates where the data came from. Either from NIH samples or from the serochip data.
This data can be viewed with text editor, Microsoft Word, and can also be read into R using tabs as the separator.
Code/software
The data analysis process was detailed in the manuscript, and the code used in the manuscript is available upon request to the corresponding author.
- Tokarz, Rafal; Guo, Cheng; Sanchez-Vicente, Santiago et al. (2024). Identification of reactive Borrelia burgdorferi peptides associated with Lyme disease. mBio. https://doi.org/10.1128/mbio.02360-24
