Breakthrough SARS-CoV-2 outcomes in immune-disordered people during the Omicron era: A prospective cohort study
Data files
Oct 06, 2025 version files 108.93 MB
-
behavior_raw_public.csv
94.80 KB
-
Data_dictionary.xlsx
22.94 KB
-
inf_public.csv
23.19 KB
-
new_elisa_public.csv
287.70 KB
-
outcomes_public.csv
7.66 KB
-
pp_raw_public.csv
1.05 KB
-
Public_code_v2.R
164.71 KB
-
README.md
6.81 KB
-
reinfection_raw_public.csv
3.34 KB
-
SARS-CoV-2_Variant_Proportions_2023-09-28.csv
108.09 MB
-
sdat_public.csv
102.23 KB
-
visit_public.csv
125.62 KB
Abstract
Introduction: Immune-deficient/disordered people (IDP) elicit a less robust immune response to COVID-19 vaccination than the general US population. Despite millions of IDP at presumed elevated risk, few population-level studies of IDP have been conducted in the Omicron era to evaluate breakthrough infection-related outcomes.
Methods: We followed a prospective cohort of 219 IDP and 63 healthy volunteers (HV) in the US from April 2021 (Alpha variant peak) to July 2023 (Omicron XBB variant peak). IDP had a primary or secondary immune disorder. All participants were ≥3 years old and received COVID-19 vaccines external to this study. We quantified anti-spike IgG titer levels by ELISA, measured breakthroughs via participant reports and laboratory tests on saliva samples, compared breakthrough incidence among HV and IDP, assessed infection complications [persistent infections, reinfections, and post-acute sequelae of COVID-19 (PASC)], and used surveys to capture COVID-19 symptoms and preventive attitudes/behaviors.
Results: Among IDP, the incidence of initial breakthrough infection was 8.8 (95% confidence interval: 4.5, 62.5) times higher during than before the Omicron era. There were 88 initial breakthrough infections among IDP (incidence rate 23.7/100 person-years) and 28 among HV (27.3/100) throughout the study period. While COVID‑19 symptoms were generally mild, five participants, all IDP, were hospitalized. In traditional analyses and an emulated trial, the quantity of anti‑spike IgG one month after participants’ most recent pre-infection vaccination was not associated with breakthrough. HV and IDP frequently practiced infection‑limiting behaviors, but IDP were more likely to continue such behaviors after vaccination. IDP experienced persistent infections, PASC, and reinfections more commonly than HV.
Conclusions: Breakthrough rates in IDP were largely equivalent to HV. However, IDP experienced a slightly higher frequency of symptoms, hospitalizations, infection persistence, PASC, and reinfections than HV. Further study is needed to elucidate the immunological mechanisms that increase the risks of such complications in IDP.
https://doi.org/10.5061/dryad.9p8cz8wsr
This data was collected as part of the PERSIST cohort study (NCT04852276), which aimed to assess the immune response to COVID-19 vaccines in people with immune disorders and healthy volunteers. We collected and analyzed data from April 2021 through July 2023 for this study. Laboratory methods that describe the assay specifics can be found in the associated publication, “Characterization of the anti-spike IgG immune response to COVID-19 vaccines in people with a wide variety of immunodeficiencies” in Science Advances and our publication in BMJ Public Health, "Breakthrough SARS-CoV-2 Outcomes in Immune-Disordered People During the Omicron Era: A Prospective Cohort Study ".
Data and file information
Nine unique datasets were used in our study, all of which are provided in a .csv format. Eight datasets were derived from participant data:
- “new_elisa_public.csv” provides ELISA measurements of the anti-spike IgG antibody concentration in response to vaccination in all our participants. The file also contains vaccination other clinical information.
- “inf_public.csv” provides data corresponding to individuals who were SARS-CoV-2 infected.
- “behavior_raw_public.csv” provides participant responses to the behavioral survey included in the Supplemental Materials
- “reinfection_raw_public.csv” provides data for those individuals with SARS-CoV-2 re-infections.
- “outcomes_public.csv” provides data for individuals with unusual or adverse outcomes, such as persistent infections, reinfections, post-dose 1 infections, etc.
- “pp_raw_public.csv” provides data on individuals with persistent SARS-CoV-2 infections.
- “sdat_public.csv” provides on individuals who were analyzed as part of the emulated trial.
- “visit_public.csv” provides data on participants and when they interacted with the study team for research purposes.
- The ninth dataset, “SARS-CoV-2_Variant_Proportions_2023-09-28.csv”, was created by the CDC and is publicly accessible. We include in our files a copy of the CDC’s dataset that was used for our study. The current version of the CDC’s dataset (which is continuously updated), along with the variable definitions, can be found here: https://data.cdc.gov/Laboratory-Surveillance/SARS-CoV-2-Variant-Proportions/jr58-6ysp (opens in new window).
Description of the data and file structure
- The variables for the eight participant-derived datasets are defined in the accompanying data dictionary, "Data_dictionary.xlsx". The data dictionary provides a link to the CDC’s website where variables for the “variant” dataset are defined.
- “new_elisa_public.csv” and “sdat_public.csv” are organized by participant ID (Subject.ID variable) and the time point of their sample (time variable). Variables that remain constant over time, such as the participant ID, are repeated for each participant across time points.
- “behavior_raw_public.csv”, “outcomes_public.csv”, “pp_raw_public.csv” are organized by participant ID (Subject.ID or SUBJECT.ID. variable).
- “inf_public.csv” and “reinfection_raw_public.csv” are organized at the infection level, with each row corresponding to an individual infection. Variables that remain constant over time, such as the participant ID, are repeated for each infection.
- “visit_public.csv” is organized by the participant ID (idt variable) and the date of contact (Date variable).
- The CDC’s “variants.csv” dataset is organized by region (usa_or_hhsregion variable), the date of the Saturday at the end of each week (week_ending variable), and the designation for each SARS-CoV-2 variant of interest (variant variable). Point and interval estimates for each variant’s prevalence in each region and week are provided.
Deidentification of the data
To limit the possibility of our participants being identified by means of our data, the datasets do not include the specific immunological conditions of our participants. All date variables (e.g., the date of vaccine receipt) relevant to our analyses were randomly shifted at the participant-level to distort their true values while retaining temporal trends intact. In addition, the variables for rare medical outcomes as a result of COVID-19 were removed from the publicly available “inf_public.csv” dataset. All results we report in the paper, including in the tables and figures, were produced with the original, unperturbed data. The R code included comments on the few specific analyses that are impacted by altering the date variables.
Code/software
We used R to analyze and visualize all the data in this study. The R script “Public_code_v2.R” can be used to analyze and visualize all of the data in the study in the order in which the analyses appear in the Results section. It is structured by paragraph in the accompanying manuscript. All errors or warnings elicited by running the R code can be safely ignored.
Human subjects data
This study (NCT04852276) was approved by the Institutional Review Board of the National Institutes of Health (NIH000384-I) and was conducted in accordance with the Declaration of Helsinki and Good Clinical Practice guidelines. Adult participants and parents or legal guardians of pediatric participants provided written, informed consent for the study and for unrestricted use of deidentified data. Pediatric participants six years and older gave verbal assent.
To limit the possibility of our participants being identified by means of our data, the datasets we include do not report on participants’ age, sex, race, ethnicity, specific immunological conditions, and the collection date of the samples. There were a variety of date variables (e.g., the date of vaccine receipt) that were critical to our analyses. These variables have been retained. However, each date variables’ values were shifted and jittered by different amounts in the datasets we publicly present. Moreover, a different random seed was used before jittering each date variable of interest. While these perturbations do not preserve the statistical quantities of the original data, the statistical issues were outweighed by the need to deidentify data and protect our participants’ privacy. All results we report in the paper, including in the tables and figures, were produced with the original, unperturbed data. The R code for the ELISA data comments on the few specific analyses that are impacted by the shifting and jittering of date variables.
All methods for the collection and processing of these data are published in the associated manuscript at BMJ Public Health (https://doi.org/10.1136/bmjph-2024-002436) and the associated manuscript https://doi.org/10.1126/sciadv.adh3150. Code and survey instruments are available at https://github.com/ericotta/PERSIST_Study/.
