Skip to main content

Data from: Estimating infection prevalence: best practices and their theoretical underpinnings

Cite this dataset

Miller, Ian F.; Schneider-Crease, India; Nunn, Charles L.; Muehlenbein, Michael P. (2019). Data from: Estimating infection prevalence: best practices and their theoretical underpinnings [Dataset]. Dryad.


Accurately estimating infection prevalence is fundamental to the study of population health, disease dynamics, and infection risk factors. Prevalence is estimated as the proportion of infected individuals (“individual-based estimation”), but is also estimated as the proportion of samples from which the disease-causing organisms are recovered (“anonymous estimation”). The latter method is often used when researchers lack information on individual host identity, which can occur during noninvasive sampling of wild populations or when the individual that produced a fecal sample is unknown. The goal of this study was to investigate biases in individual-based versus anonymous prevalence estimation theoretically and to test whether mathematically derived predictions are evident in a comparative dataset of gastrointestinal helminth infections in nonhuman primates. Using a mathematical model, we predict that anonymous estimates of prevalence will be lower than individual-based estimates when (a) samples from infected individuals do not always contain evidence of infection and/or (b) when false negatives occur. The mathematical model further predicts that no difference in bias should exist between anonymous estimation and individual-based estimation when one sample is collected from each individual. Using data on helminth parasites of primates, we find that anonymous estimates of prevalence are significantly and substantially (12.17%) lower than individual-based estimates of prevalence. We also observed that individual-based estimates of prevalence from studies employing single sampling are on average 6.4% higher than anonymous estimates, suggesting a bias toward sampling infected individuals. We recommend that researchers use individual-based study designs with repeated sampling of individuals to obtain the most accurate estimate of infection prevalence. Moreover, to ensure accurate interpretation of their results and to allow for prevalence estimates to be compared among studies, it is essential that authors explicitly describe their sampling designs and prevalence calculations in publications.

Usage notes


National Science Foundation, Award: BCS-1355902