Skip to main content
Dryad

Data from: PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment

Cite this dataset

Lartillot, Nicolas; Rodrigue, Nicolas; Stubbs, Daniel; Richer, Jacques (2013). Data from: PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment [Dataset]. Dryad. https://doi.org/10.5061/dryad.c459h

Abstract

Modeling across site variation of the substitution process is increasingly recognized as important for obtaining more accurate phylogenetic reconstructions. Both finite and infinite mixture models have been proposed, and have been shown to significantly improve on classical single-matrix models. Compared to their finite counterparts, infinite mixtures have a greater expressivity. However, they are computationally more challenging. This has resulted in practical compromises in the design of infinite mixture models. In particular, a fast but simplified version of a Dirichlet process model over equilibirum frequency profiles implemented in PhyloBayes (Lartillot et al, 2007) has often been used in recent phylogenomics studies, while more refined model structures, more realistic and empirically more fit, have been practically out of reach. We introduce an Message Passing Interface (MPI) version of PhyloBayes, implementing the Dirichlet process mixture models as well as more classical empirical matrices and finite mixtures. The parallelization is made efficient thanks to the combination of two algorithmic strategies: a partial Gibbs sampling update of the tree topology, and the use of a truncated stick-breaking representation for the Dirichlet process prior. The implementation shows close to linear gains in computational speed for up to 64 cores, thus allowing faster phylogenetic reconstruction under complex mixture models. PhyloBayes MPI is freely available from our website www.phylobayes.org.

Usage notes