Skip to main content

Protective effect conferred by prior infection and vaccination on COVID-19 in a healthcare worker cohort in South India

Cite this dataset

Murugesan, Malathi et al. (2022). Protective effect conferred by prior infection and vaccination on COVID-19 in a healthcare worker cohort in South India [Dataset]. Dryad.


The emergence of newer variants with the immune escape potential raises concerns about breakthroughs and re-infections resulting in future waves of infection. We examined the protective effect of prior COVID-19 disease and vaccination on infection rates among a cohort of healthcare workers (HCW) in South India during the second wave driven mainly by the delta variant. Symptomatic HCWs were routinely tested by RT-PCR as per institutional policy. Vaccination was offered to all HCWs in late January, and the details were documented. We set up a non-concurrent cohort to document infection rates and estimated protective efficacy of prior infection and vaccination between 16th Apr to 31st May 2021, using a Cox proportional hazards model with time-varying covariates adjusting for daily incidence.

Between June 2020 and May 2021, 2735 (23.9%) of 11,405 HCWs were infected, with 1412, including 32 re-infections, reported during the second wave. 6863 HCWs received two doses of vaccine and 1905 one dose. The protective efficacy of prior infection against symptomatic infection was 86.0% (95% CI 76.7% - 91.6%). Vaccination combined with prior infection provided 91.1% (95% CI 84.1% - 94.9%) efficacy. In the absence of prior infection, vaccine efficacy against symptomatic infection during the second wave was 31.8% (95% CI 23.5% – 39.1%).

Prior infection provided substantial protection against symptomatic re-infection and severe disease during a delta variant-driven second wave in a cohort of health care workers.


This non-concurrent cohort study was conducted among the staff of a tertiary care teaching hospital in South India. The demographic, clinical and exposure variables and vaccination history were prospectively documented in an electronic database from all those presenting for COVID testing. All immunization was documented along with the date of vaccination, type of vaccine, and any adverse events. Linking the SARS-COV2 testing data set with the vaccination and administrative payroll information, we established a non-concurrent cohort that included all current employees. Every employee has a unique employment ID which was used to match across the datasets. Two investigators independently assessed the datasets to verify the accuracy of the data and linkages between the datasets. Participants were categorized into four risk groups based on their prior infection and vaccination status, namely, the unvaccinated and previously uninfected; vaccinated and previously uninfected; unvaccinated and previously infected, and vaccinated and previously infected. A sensitivity analysis that excluded participants who had received a single dose was not significantly different from the one that included those who received one dose as unvaccinated. Hence the binary classification of vaccinated and unvaccinated was based on the completion of two doses of vaccination 2 weeks after the second dose.

Usage notes

Kaplan Meier Survival analysis was done with failure defined as the acquisition of infection during the analysis period. A Log-rank test was performed to compare the survival curves across the four risk groups.  We developed a Cox-proportional hazards (PH) model with time-varying covariates adjusting for smoothed daily incidence of COVID-19 and potential confounders (S1 table). The model included participant age, type of work, sex, history of prior infection, and vaccination, as epidemiologically relevant factors. The model was tested for the proportional-hazards assumption on Schoenfeld's residuals and the PH assumption was not violated (p-value - 0.134). Efficacy of prior infection and vaccines to prevent symptomatic infection in the study period were calculated as VE= 1- hazard ratio from the Cox proportional hazard model. All data analysis was performed using Stata 15.1 (Statacorp LLC, College Station, TX).