COVID-19 reinfection data on individuals diagnosed with their first SARS-CoV-2 infection
Data files
Feb 14, 2025 version files 7.35 MB
-
COVID-19_reinfection.zip
7.34 MB
-
README.md
6.44 KB
Abstract
Background: In many settings, a large fraction of the population has both been vaccinated against and infected by SARS-CoV-2. Hence, quantifying the protection provided by post-infection vaccination has become critical for policy. We aimed to estimate the protective effect against SARS-CoV-2 reinfection of an additional vaccine dose after an initial Omicron variant infection.
Methods: We report a retrospective, population-based cohort study performed in Shanghai, China, using electronic databases with information on SARS-CoV-2 infections and vaccination history. We compared reinfection incidence by post-infection vaccination status in individuals initially infected during the April-May 2022 Omicron variant surge in Shanghai and who had been vaccinated before that period. Cox models were fit to estimate adjusted hazard ratios (aHR).
Results: 275,896 individuals were diagnosed with RT-PCR-confirmed SARS-CoV-2 infection in April-May 2022; 199,312/275,896 were included in analyses on the effect of a post-infection vaccine dose. Post-infection vaccination provided protection against reinfection (aHR 0.82; 95% CI 0.79-0.85). For patients who had received one, two or three vaccine doses before their first infection, hazard ratios for the post-infection vaccination effect were 0.84 (0.76-0.93), 0.87 (0.83-0.90) and 0.96 (0.74-1.23), respectively. Vaccination within 30 and 90 days before the second Omicron wave provided different degrees of protection (in aHR): 0.51 (0.44-0.58), and 0.67 (0.61-0.74), respectively. Moreover, for all vaccine types, but to different extents, a post-infection dose given to individuals who were fully vaccinated before first infection was protective.
Conclusions: In previously vaccinated and infected individuals, an additional vaccine dose provided protection against Omicron variant reinfection. These observations will inform future policy decisions on COVID-19 vaccination in China and other countries.
Access this dataset on Dryad ( https://doi.org/10.5061/dryad.rfj6q57ks)
The dataset also includes the primary analytical code and the temporal vaccination coverage among the study population, as well as the data used to generate the figures and tables in the article and anonymized individual reinfection data.
Analysis code
The code utilized for data analysis and graph plotting.
Case data
The anonymized case reinfection data that can be accessed. The "null" represents missing values of case information, which are attributed to the original data loss caused by the immense epidemiological investigation pressure during the outbreak period.
- Sex: Indicates the biological sex of the individual;
- Age: Represents the age of the individual at the time of data collection;
- Date of first infection: Refers to the date or time for the first infection;
- Date of reinfection: Indicates the date or time of the second infection, if applicable.
Reinfection rate
the data on reinfections among individuals with different vaccination statuses, which were used to generate Figure 2.
- Vaccination after first infection: This variable indicates the vaccination status of individuals following their first infection;
- Date: Refers to the specific date related to the study;
- Est: Short for "estimate," this variable represents the estimated mean value of reinfection rate;
- Lower: the lower bound of the confidence interval;
- Upper: the upper bound of the confidence interval;
- Group: different groups within the study population, categorized based on vaccination times before first infection.
Table_varying-exposure
displays the results corresponding to Figure 4.
- Character: This variable refers to a specific characteristic or attribute being studied;
- Sub: This represents a subgroup or subset of the population being studied;
- Unvaccinated_n: This variable represents the count or number of individuals who are unvaccinated within the specified subgroup or the entire population;
- Unvaccinated: This indicates the reinfection rates among unvaccinated individuals (the numbers in the parentheses in the “Unvaccinated” column indicate the lower and upper bounds of the 95% confidence interval for the reinfection rates among unvaccinated individuals;
- Vaccinated_n: Similar to "Unvaccinated_n," this variable represents the count or number of individuals who are vaccinated within the specified subgroup or the entire population;
- Vaccinated: This indicates the reinfection rates among vaccinated individuals (the numbers in the parentheses in the “Vaccinated” column indicate the lower and upper bounds of the 95% confidence interval for the reinfection rates among vaccinated individuals);
- HR: Hazard Ratio, a measure of the relative risk of reinfection between vaccinated and unvaccinated individuals;
- Low: Lower bound of the confidence interval for the hazard ratio;
- High: Upper bound of the confidence interval for the hazard ratio;
- Pvalue: the p-value of statistical significance;
- CI: Confidence Interval.
Varying vaccine coverage rate1
denotes the vaccine coverage rate within the population.
- vaccine_time_1: The specific date or time point at which the vaccination was administered;
- group: Subgroups within the study population categorized based on vaccination status;
- rate: The proportion or percentage of individuals who have been vaccinated within the specified population.
Varying vaccine coverage rate2
Similar to the "Varying vaccine coverage rate1". The file “Varying_vaccine_coverage_rate2” is categorized solely based on three vaccination statuses: Partial vaccination, Full vaccination, and Booster vaccination. In contrast, the file “Varying_vaccine_coverage_rate1” is also divided according to the doses of the vaccine, including 1st dose of vaccine, 2nd dose of vaccine, 3rd dose of vaccine, and 4th dose of vaccine.
VE_table
denotes the effect of post- infection vaccination on severe acute respiratory syndrome coronavirus 2 (SARS- CoV- 2) reinfection stratified by pre- infection vaccination.
- high: the upper bound of the confidence interval;
- low: the upper bound of the confidence interval;
- Pvalue: the p-value of statistical significance;
- HR: Hazard Ratio; character: a specific characteristic or attribute being studied;
- group: different subgroups within the study population;
- CI: Confidence Interval;
- Kind: this variable refers to the different methods being analyzed in the study.
Software
All statistical analyses were performed using R.4.1.1 software (Foundation for Statistical Computing, Vienna, Austria; https://www.r-project.org).The uploaded code file is written in R, a programming language widely used for statistical analysis and data visualization. Below are some key notes about the code file:
- Language: R
- Environment: The code is designed to run in an R environment.
- Dependencies: The script uses several R packages, including:
- MatchIt for propensity score matching.
- survival and survminer for survival analysis.
- epiR for epidemiological calculations.
- dplyr and tidyverse for data manipulation.
- ggplot2 for data visualization.
- forester for creating forest plots.
- readxl for reading Excel files.
- Input Data Files: The script uses various input datas, such as:
- mydat1 and mydat2: Data frames containing the primary case datasets.
- shenlin1.xlsx, shenlin2.xlsx, shenlin3.xlsx: Excel files containing additional data for visualization and analysis.
- Table_varying.xlsx: An Excel file used for creating forest plots.
- Outputs
- CSV Files: The script generates multiple CSV files containing analysis results, such as:Table1.csv, Table2.csv, VE_table.csv, baseline.csv, table S1_1.csv, Table_varying-exposure, Varying vaccine coverage rate1, and Varying vaccine coverage rate2.
- Visualization Files: The script produces several visualization files, including:
- VE.png: A forest plot created using the forester package.
- Reinfection rate.png: A combined plot of vaccination rates and reinfection rates over time.
- Vaccination rate_S1.png: A plot showing vaccination rates over time.
During the first wave of the Omicron variant, Shanghai, a city with a population of over 25 million, underwent multiple rounds of SARS-CoV-2 RT-PCR testing from April 1 to May 31, 2022. Both asymptomatic individuals identified through mass screening and symptomatic individuals seen by healthcare professionals were included. Demographic data (sex at birth, age) and infection history were provided by the Shanghai Center for Disease Control and Prevention. The reinfection data collected prior to January 2023, with reinfection-related death defined as death within 30 days of a SARS-CoV-2 reinfection.
Individuals with missing first infection dates or those who died from non-COVID-19 reasons between the two Omicron waves were excluded.The analysis centered on individuals who had received at least one vaccine dose before their first SARS-CoV-2 infection.
