Data from: Probabilistic methods outperform parsimony in the phylogenetic analysis of data simulated without a probabilistic model
Data files
Jun 26, 2019 version files 6.75 MB
Abstract
In order to understand patterns and processes of the diversification of life we require an accurate understanding of taxa interrelationships. Recent studies have suggested that analyses of morphological character data using the Bayesian and Maximum likelihood Mk model provide phylogenies of higher accuracy compared to parsimony methods. These studies have proved controversial, particularly simulating morphology-data under Markov models that assume shared branch lengths for characters, as it is claimed this leads to bias favouring the Bayesian or Maximum likelihood Mk model over parsimony models which do not explicitly make this assumption. We avoid these potential issues by employing a simulation protocol in which character states are randomly assigned to tips, but datasets are constrained to an empirically-realistic distribution of homoplasy as measured by the Consistency Index. Datasets were analysed with equal-weights and implied weights parsimony, and the Maximum Likelihood and Bayesian Mk model. We find that consistent (low homoplasy) datasets render method choice largely irrelevant, as all methods perform well with high consistency (low homoplasy) datasets, but the largest discrepancies in accuracy occur with low consistency datasets (high homoplasy). In such cases, the Bayesian Mk model is significantly more accurate than alternative models, and Implied weights parsimony never significantly out-performs the Bayesian Mk model. When poorly-supported branches are collapsed, the Bayesian Mk model recovers trees with higher resolution compared to other methods. Since it is not possible to assess homoplasy independently of a tree estimate, the Bayesian Mk model emerges as the most reliable method for categorical morphological analyses.