Data from: Among-character rate variation distributions in phylogenetic analysis of discrete morphological characters
Harrison, Luke B.; Larsson, Hans C. E. (2015), Data from: Among-character rate variation distributions in phylogenetic analysis of discrete morphological characters, Dryad, Dataset, https://doi.org/10.5061/dryad.067qg
Likelihood-based methods are commonplace in phylogenetic systematics. Although much effort has been directed toward likelihood-based models for molecular data, comparatively less work has addressed models for discrete morphological character data. Among-character rate variation may confound phylogenetic analysis, but there have been few analyses of the magnitude and distribution of rate heterogeneity among discrete morphological characters. Using seventy-six data sets covering a range of plants, invertebrate, and vertebrate animals, we used a modified version of MrBayes to test equal, gamma-distributed and lognormally-distributed models of among-character rate variation, integrating across phylogenetic uncertainty using Bayesian model selection. We found that in approximately 80% of data sets, unequal-rates models outperformed equal-rates models, especially among larger data sets. Moreover, although most data sets were equivocal, more data sets favored the lognormal rate distribution relative to the gamma rate distribution, lending some support for more complex character correlations than in molecular data. Parsimony estimation of the underlying rate distributions in several data sets suggests that the lognormal distribution is preferred when there are many slowly evolving characters and fewer quickly evolving characters. The commonly adopted four rate category discrete approximation used for molecular data was found to be sufficient to approximate a gamma rate distribution with discrete characters. However, among the two data sets tested that favored a lognormal rate distribution, the continuous distribution was better approximated with at least eight discrete rate categories. Although the effect of rate model on the estimation of topology was difficult to assess across all data sets, it appeared relatively minor between the unequal-rates models for the one data set examined carefully. As in molecular analyses, we argue that researchers should test and adopt the most appropriate model of rate variation for the data set in question. As discrete characters are increasingly used in more sophisticated likelihood-based phylogenetic analyses, it is important that these studies be built on the most appropriate and carefully selected underlying models of evolution.