Skip to main content
Dryad

Data from: Deep peptide recognition profiling decodes TCR specificity and enables disease-associated antigen discovery

Data files

Mar 31, 2026 version files 183.88 MB
Mar 31, 2026 version files 183.88 MB

Click names to download individual files

Abstract

Predicting T cell receptor (TCR) specificity based on sequence is challenging because TCRs of similar sequence can recognize entirely different antigens, whereas TCRs of different sequence can recognize the same antigens. Here, we present a system that integrates high-throughput yeast display with fine-tuned protein language models (pLMs) to generate deep Peptide Recognition Profiles (PRPs) for individual TCRs, each detailing binding against millions of peptides. We provide detailed PRPs for a panel of HLA-B*27:05-restricted TCRs from patients with ankylosing spondylitis and acute anterior uveitis that almost exclusively recognize peptides through CDR3β. pLMs trained on these PRPs outperform AlphaFold3 and tFold-TCR in predicting T cell activation. We discover and validate novel candidate autoantigens, demonstrate that model generalization to new TCRs correlates with functional distance (PRP divergence) rather than sequence similarity, and introduce a model-intrinsic uncertainty metric to quantify prediction confidence. This system and its associated PRP datasets offer a scalable approach to mapping TCR recognition, accelerating antigen discovery, and guiding TCR engineering.