A Trypanosoma cruzi trans-sialidase peptide demonstrates high serological prevalence among infected populations across endemic regions of Latin America

Kortbawi, Hannah 1 ; Marczak, Ryan1; Rajan, Jayant2; Bulaong, Nash3; Pak, John3; Wu, Wesley3; Wang, Grace3; Mitchell, Anthea3; Saxena, Aditi4; Maheswari, Aditi5; Fleischmann, Charles1; Kelly, Emily1; Teal, Evan1; Townsend, Rebecca6; Stramer, Susan6; Okamoto, Emi7; Sherbuk, Jacqueline7; Clark, Eva8; Gilman, Robert9; Colanzi, Rony10; Gennatas, Efstathios1; Bern, Caryn1; DeRisi, Joseph1; Whitman, Jeffrey1

Published Jan 07, 2026 on Dryad. https://doi.org/10.5061/dryad.9kd51c5v0

Data files

Jan 07, 2026 version files 314.66 MB

dryad_data.tgz

314.65 MB
README.md

5.20 KB

Abstract

Infection by Trypanosoma cruzi, the agent of Chagas disease, can irreparably damage the cardiac and gastrointestinal systems during decades of parasite persistence and related inflammation in these tissues. Diagnosis of chronic disease requires confirmation by multiple serological assays due to the imperfect performance of existing clinical tests. Current serology tests utilize antigens discovered over three decades ago with small specimen sets predominantly from South America, and lower test performance has been observed in patients who acquired T. cruzi infection in Central America and Mexico. Here, we attempt to address this gap by evaluating antibody responses against the entire T. cruzi proteome with phage display immunoprecipitation sequencing comprised of 228,127 47-amino acid peptides. We utilized diverse specimen sets from Mexico, Central America and South America, as well as different stages of cardiac disease severity, from 185 cases and 143 controls. We identified over 1,300 antigenic T. cruzi peptides derived from 961 proteins between specimen sets. A total of 67 peptides were reactive in 70% of samples across all regions, and 3 peptide epitopes were enriched in ≥90% of seropositive samples. Of these three, only one antigen, belonging to the trans-sialidase family, has not previously been described as a diagnostic target. Orthogonal validation of this peptide demonstrated increased antibody reactivity for infections originating from Central America. Overall, this study provides proteome-wide identification of seroreactive T. cruzi peptides across a large cohort spanning multiple endemic areas and identified a novel trans-sialidase peptide antigen (TS-2.23) with significant potential for translation into diagnostic serological assays.

https://doi.org/10.5061/dryad.9kd51c5v0

Description of the data and file structure

This dataset contains the data required to replicate analyses in Kortbawi et al. (in review), which investigates antibody reactivity to T. cruzi peptides in patients with Chagas disease relative to seronegative controls using phage immunoprecipitation sequencing. Patient samples were incubated against a novel phage display library that consisted of 228,127 tiled 47-amino acid peptides that cover the entire T. cruzi proteome. Data cover reactivity from 185 cases and 143 controls, sampled across diverse geographic regions and with varying stages of heart disease. Analysis of these data identified a T. cruzi trans-sialidase peptide with high seroreactivity across geographic regions. Orthogonal experiments (split-luciferase binding assays and biolayer interferometry) validated this finding and identified the reactive epitope within this antigen.

Files and variables

File: dryad_data.tgz

Description:

a. Tcruzi_library_nucleotides.fa - reference sequence file used for PhIP-seq library construction. This contains the nucleotide sequences of all peptides with the 5' and 3' linker sequences. This library also includes additional, non-T. Cruzi sequences that were not analyzed in this project.

b. rpk_repository/ - read counts (normalized to read depth) for each peptide from paired end alignment of each sample's raw sequencing reads. The file names are structured <CBM/BD>raw_rpk.csv where <CBM/BD> represent the specimen set the rpk data corresponds to. Some CBM samples were re-sequenced, and so there is a second CBM rpk file with a "2" appended to the file name that represents re-sequenced data. Each sample is named as follows: Some sample names are structured CHAGAS*R1_001 and represent re-sequenced Cardiac Biomarker specimens. * refers to the specimen set sample IDs and looks like "BIO####" in the case of seropositive samples, "GFAP#" in the case of positive control polyclonal antibody samples, or "HC###" in the case of seronegative control samples. refers to lab-specific information about remaining sample volume. R1_001 refers to metadata from the sequencing run. Some file names are structured CHAGAS_*R1_001 and represent Cardiac Biomarker specimens. * refers to the plate (P#) and well location for each sample, and all other labeling details are the same. Some file names are structured tclib__R1_001 and represent Blood Donor specimens. refers to the 96-well plate location of each sample. refers to the sample's unique ID and looks like "S####".

c. p2g_mapping.csv - list of all the peptides in T. cruzi library with mapping to corresponding genes (RefSeq Protein ID, TriTrypDB gene ID, Uniprot ID, protein name) from which they are derived and amino acid sequences. Stage information for any protein with a known expression pattern across T. cruzi life cycle stages is included, adapted from the TriTrypDB dataset 'Life cycle proteome (Brazil)'.

d. readcounts_repository/ - Folder that contains the total sequencing read counts for each sample. The file names are structured <CBM/BD>_readcounts.csv where <CBM/BD> represent the specimen set the read count data corresponds to. Some CBM samples were re-sequenced, and so there is a second CBM read count file with a "2" appended to the file name that represents re-sequenced data.

e. fimo_output_repository/ - Folder that contains the FIMO outputs for known T. cruzi antigen motifs. This output contains all peptides with a significant match to each motif, and includes the p- and q-values for each motif match, as well as the peptide id (column "sequence_name") that represents the peptide with the motif match.

f. fig3b_slba.csv - File that contains the split-luciferase binding assay (SLBA) results for TS-2.23, used to generate Fig. 3b.

g. fig3c_slba.csv - File that contains SLBA results for TS-2.23 alanine scanning, used to generate Fig. 3c.

h. fig5_bli.csv - File that contains biolayer interferometry results for TS-2.23, used to generate Fig. 5b.

i. metadata_repository/ - Folder that contains the metadata for all samples. The file names are structured <CBM/BD>_metadata.csv where <CBM/BD> represent the specimen set the metadata corresponds to. Some CBM samples were re-sequenced, and so there is a second CBM metadata file with a "2" appended to the file name that represents re-sequenced samples - this data is the same as the data in the original CBM_metadata file, but has been subsetted for ease of processing. Each sample has a unique barcode starting with "tclib" or "CHAGAS" that will correspond to the sample names in rpk_repository and readcounts_repository.

Code/software

The code used to view and analyze these data can be found at https://github.com/hkortbawi/tcruzi_phipseq_2025.