Skip to main content
Dryad

Data from: Posterior predictive checks of coalescent models: P2C2M, an R package

Cite this dataset

Gruenstaeudl, Michael; Reid, Noah M.; Wheeler, Gregory L.; Carstens, Bryan C. (2015). Data from: Posterior predictive checks of coalescent models: P2C2M, an R package [Dataset]. Dryad. https://doi.org/10.5061/dryad.n715n

Abstract

Bayesian inference operates under the assumption that the empirical data are a good statistical fit to the analytical model, but this assumption can be challenging to evaluate. Here, we introduce a novel r package that utilizes posterior predictive simulation to evaluate the fit of the multispecies coalescent model used to estimate species trees. We conduct a simulation study to evaluate the consistency of different summary statistics in comparing posterior and posterior predictive distributions, the use of simulation replication in reducing error rates and the utility of parallel process invocation towards improving computation times. We also test P2C2M on two empirical data sets in which hybridization and gene flow are suspected of contributing to shared polymorphism, which is in violation with the coalescent model: Tamias chipmunks and Myotis bats. Our results indicate that (i) probability-based summary statistics display the lowest error rates, (ii) the implementation of simulation replication decreases the rate of type II errors, and (iii) our r package displays improved statistical power compared to previous implementations of this approach. When probabilistic summary statistics are used, P2C2M corroborates the assumption that genealogies collected from Tamias and Myotis are not a good fit to the multispecies coalescent model. Taken as a whole, our findings argue that an assessment of the fit of the multispecies coalescent model should accompany any phylogenetic analysis that estimates a species tree.

Usage notes