Nearly all current Bayesian phylogenetic applications rely on Markov chain Monte Carlo (MCMC) methods to approximate the posterior distribution for trees and other parameters of the model. These approximations are only reliable if Markov chains adequately converge and sample from the joint posterior distribution. While several studies of phylogenetic MCMC convergence exist, these have focused on simulated datasets or select empirical examples. Therefore, much that is considered common knowledge about MCMC in empirical systems derives from a relatively small family of analyses under ideal conditions. To address this, we present an overview of commonly applied phylogenetic MCMC diagnostics and an assessment of patterns of these diagnostics across more than 18,000 empirical analyses. Many analyses appeared to perform well and failures in convergence were most likely to be detected using the average standard deviation of split frequencies, a diagnostic that compares topologies among independent chains. Different diagnostics yielded different information about failed convergence, demonstrating that multiple diagnostics must be employed to reliably detect problems. The number of taxa and average branch lengths in analyses have clear impacts on MCMC performance, with more taxa and shorter branches leading to more difficult convergence. We show that the usage of models that include both Γ-distributed among-site rate variation and a proportion of invariable sites are not broadly problematic for MCMC convergence but are also unnecessary. Changes to heating and the usage of model-averaged substitution models can both offer improved convergence in some cases, but neither are a panacea.

These files make up larger tar files that have been split into < 10 GB blocks in order to facilitate downloads. Once all blocks have been downloaded for one dataset (e.g., Amniotes.tar.block_[a-g]) they can be re-assembled using the unix command:
cat DATASET.tar* | tar xzvf -

This archive can then be unpacked using the unix command:
tar -xvf DATASET.tar

The resulting archives contain tar.gz files for the individual MrBayes analyses. They can be unpacked individually using the unix command:
tar -xvf NAME.tar.gz

or as a group using:
tar -xvf *.tar.gz (This will require an extremely large amount of hard disk space.)

each tar.gz contains several files:

NAME.nex     --   (the data that was analyzed. Most of these files also contain analysis settings in a Bayes Block at the end of the file.)
NAME.bb      --   (some analyses may contain a separate bayes block with analysis settings when these were not included in the '.nex' file.)
NAME.run1.p --   (continuous parameter samples from run 1)
NAME.run1.t --   (tree samples from run 1)
NAME.run2.p --   (continuous parameter samples from run 2)
NAME.run2.t --   (tree samples from run 2)
some analyses will contain tree and parameter samples from additional runs, indicated by the suffix run3, run4, etc.
NAME.mcmc    --   (mcmc information from this analysis)
NAME_mb.log --   (log file from this analysis)
NAME.pstat   --   (table of parameter estimates)
NAME.lstat   --   (table of marginal likelihood estimates)
NAME.ckp     --   (checkpoint file that can be used to restart the analysis)

Data from: Properties of Markov chain Monte Carlo performance across many empirical alignments -- part I

Data files

Abstract

Data from: Properties of Markov chain Monte Carlo performance across many empirical alignments -- part I

Data files

Abstract

Usage notes

Works referencing this dataset