Skip to main content

Timing the SARS-CoV-2 index case in Hubei Province

Cite this dataset

Pekar, Jonathan; Wertheim, Joel (2021). Timing the SARS-CoV-2 index case in Hubei Province [Dataset]. Dryad.


Understanding when SARS-CoV-2 emerged is critical to evaluating our current approach to monitoring novel zoonotic pathogens and understanding the failure of early containment and mitigation efforts for COVID-19. We employed a coalescent framework to combine retrospective molecular clock inference with forward epidemiological simulations to determine how long SARS-CoV-2 could have circulated prior to the time of the most recent common ancestor. Our results define the period between mid-October and mid-November 2019 as the plausible interval when the first case of SARS-CoV-2 emerged in Hubei province. By characterizing the likely dynamics of the virus before it was discovered, we show that over two-thirds of SARS-CoV-2-like zoonotic events would be self-limited, dying out without igniting a pandemic. Our findings highlight the shortcomings of zoonosis surveillance approaches for detecting highly contagious pathogens with moderate mortality rates.


The dataset was collected by using BEAST and FAVITES to generate phylogenetic analyses and epidemic simulations, respectively. Both were manually processed to determine the tMRCA and stable coalescence, respectively. Refer to the manuscript for further details. 

Usage notes

BEAST data is available for use, along with the aggregated FAVITES results. The input files for the final results in the manuscript are available and they can be used with the python script provided. Please see the README.txt file and manuscript for further details.