Skip to main content

An open-access database of infectious disease transmission trees to explore superspreader epidemiology

Cite this dataset

Taube, Juliana C.; Miller, Paige B.; Drake, John M. (2022). An open-access database of infectious disease transmission trees to explore superspreader epidemiology [Dataset]. Dryad.


Historically, emerging and reemerging infectious diseases have caused large, deadly, and expensive multinational outbreaks. Often outbreak investigations aim to identify who infected whom by reconstructing the outbreak transmission tree, which visualizes transmission between individuals as a network with nodes representing individuals and branches representing transmission from person to person. We compiled a database, called OutbreakTrees, of 382 published, standardized transmission trees consisting of 16 directly transmitted diseases ranging in size from 2 to 286 cases. For each tree and disease, we calculated several key statistics, such as tree size, average number of secondary infections, the dispersion parameter, and the proportion of cases considered superspreaders, and examined how these statistics varied over the course of each outbreak and under different assumptions about the completeness of outbreak investigations. We demonstrated the potential utility of the database through 2 short analyses addressing questions about superspreader epidemiology for a variety of diseases, including Coronavirus Disease 2019 (COVID-19). First, we found that our transmission trees were consistent with theory predicting that intermediate dispersion parameters give rise to the highest proportion of cases causing superspreading events. Additionally, we investigated patterns in how superspreaders are infected. Across trees with more than 1 superspreader, we found preliminary support for the theory that superspreaders generate other superspreaders. In sum, our findings put the role of superspreading in COVID-19 transmission in perspective with that of other diseases and suggest an approach to further research regarding the generation of superspreaders. These data have been made openly available to encourage reuse and further scientific inquiry.

Usage notes

The code using these data to reproduce the figures in Taube et al. "An open-access database of infectious disease transmission trees enables exploration of superspreader epidemiology" can be found on Github at OutbreakTrees, the database of transmission trees underlying these data, can be found at


National Science Foundation, Award: DBI-1659683

National Science Foundation, Award: DGE-1545433

Division of Biological Infrastructure, Award: DBI-1659683

Division of Graduate Education, Award: DGE-1545433