Data from: PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment
Data files
Apr 02, 2013 version files 537.73 KB
-
LartillotSB2013suppmat.pdf
537.73 KB
Abstract
Modeling across site variation of the substitution process is increasingly recognized as important for obtaining more accurate phylogenetic reconstructions. Both finite and infinite mixture models have been proposed, and have been shown to significantly improve on classical single-matrix models. Compared to their finite counterparts, infinite mixtures have a greater expressivity. However, they are computationally more challenging. This has resulted in practical compromises in the design of infinite mixture models. In particular, a fast but simplified version of a Dirichlet process model over equilibirum frequency profiles implemented in PhyloBayes (Lartillot et al, 2007) has often been used in recent phylogenomics studies, while more refined model structures, more realistic and empirically more fit, have been practically out of reach. We introduce an Message Passing Interface (MPI) version of PhyloBayes, implementing the Dirichlet process mixture models as well as more classical empirical matrices and finite mixtures. The parallelization is made efficient thanks to the combination of two algorithmic strategies: a partial Gibbs sampling update of the tree topology, and the use of a truncated stick-breaking representation for the Dirichlet process prior. The implementation shows close to linear gains in computational speed for up to 64 cores, thus allowing faster phylogenetic reconstruction under complex mixture models. PhyloBayes MPI is freely available from our website www.phylobayes.org.