Skip to main content

Autoantibody discovery across monogenic, acquired, and COVID19-associated autoimmunity with scalable PhIP-Seq

Cite this dataset

Vazquez, Sara et al. (2022). Autoantibody discovery across monogenic, acquired, and COVID19-associated autoimmunity with scalable PhIP-Seq [Dataset]. Dryad.


Phage Immunoprecipitation-Sequencing (PhIP-Seq) allows for unbiased, proteome-wide autoantibody discovery across a variety of disease settings, with identification of disease-specific autoantigens providing new insight into previously poorly understood forms of immune dysregulation. Despite several successful implementations of PhIP-Seq for autoantigen discovery, including our previous work (Vazquez et al. 2020), current protocols are inherently difficult to scale to accommodate large cohorts of cases and importantly, healthy controls. Here, we develop and validate a high throughput extension of PhIP-seq in various etiologies of autoimmune and inflammatory diseases, including APS1, IPEX, RAG1/2 deficiency, Kawasaki Disease (KD), Multisystem Inflammatory Syndrome in Children (MIS-C), and finally, mild and severe forms of COVID-19. We demonstrate that these scaled datasets enable machine-learning approaches that result in robust prediction of disease status, as well as the ability to detect both known and novel autoantigens, such as PDYN in APS1 patients, and intestinally expressed proteins BEST4 and BTNL8 in IPEX patients. Remarkably, BEST4 antibodies were also found in 2 patients with RAG1/2 deficiency, one of whom had very early onset IBD. Scaled PhIP-Seq examination of both MIS-C and KD demonstrated rare, overlapping antigens, including CGNL1, as well as several strongly enriched putative pneumonia-associated antigens in severe COVID-19, including the endosomal protein EEA1. Together, scaled PhIP-Seq provides a valuable tool for broadly assessing both rare and common autoantigen overlap between autoimmune diseases of varying origins and etiologies.


DNA libraries were barcoded and amplified, gel purified, and subjected to Next-Generation Sequencing on an Illumina NovaSeq Instrument (Illumina, San Diego, CA). Sequencing reads from raw fastq files (see: fastq files) were aligned to the reference library (see: reference.fasta) using RAPSearch2.

Analyses are described in the linked manuscript. Briefly, for gene-level analysis, all peptide counts mapping to the same gene are summed. 0.5 reads are added to all genes, and raw reads are normalized by converting to percentage of total reads per sample (for peptide sequences and peptide-to-gene conversion, see: peptide_gene_mapping.csv). Fold change over mock-IP (FC) is calculated on a gene-by-gene basis by dividing sample read percentage by mean read percentage in corresponding AG bead-only samples. Z-scores are calculated using FC values; for each disease sample by using all corresponding healthy controls, and for each healthy control samples by using all other healthy controls.

Sample information is available in the accompanying .csv files (see: metadata csv files).


National Institute of Allergy and Infectious Diseases, Award: 5P01AI118688

National Institute of Diabetes and Digestive and Kidney Diseases, Award: 1F30DK123915

National Institute of Allergy and Infectious Diseases, Award: 1ZIAAI001175

Chan Zuckerberg Biohub

Larry L. Hillblom Foundation