Data from: Among-character rate variation distributions in phylogenetic analysis of discrete morphological characters

Harrison, Luke B.1; Larsson, Hans C. E.1

Published Jan 13, 2015 on Dryad. https://doi.org/10.5061/dryad.067qg

Data files

Jan 13, 2015 version files 1.95 MB

HarrisonLarsson_AmongCharacterRates.zip

1.94 MB
README_for_HarrisonLarsson_AmongCharacterRates.txt

1.85 KB

Abstract

Likelihood-based methods are commonplace in phylogenetic systematics. Although much effort has been directed toward likelihood-based models for molecular data, comparatively less work has addressed models for discrete morphological character data. Among-character rate variation may confound phylogenetic analysis, but there have been few analyses of the magnitude and distribution of rate heterogeneity among discrete morphological characters. Using seventy-six data sets covering a range of plants, invertebrate, and vertebrate animals, we used a modified version of MrBayes to test equal, gamma-distributed and lognormally-distributed models of among-character rate variation, integrating across phylogenetic uncertainty using Bayesian model selection. We found that in approximately 80% of data sets, unequal-rates models outperformed equal-rates models, especially among larger data sets. Moreover, although most data sets were equivocal, more data sets favored the lognormal rate distribution relative to the gamma rate distribution, lending some support for more complex character correlations than in molecular data. Parsimony estimation of the underlying rate distributions in several data sets suggests that the lognormal distribution is preferred when there are many slowly evolving characters and fewer quickly evolving characters. The commonly adopted four rate category discrete approximation used for molecular data was found to be sufficient to approximate a gamma rate distribution with discrete characters. However, among the two data sets tested that favored a lognormal rate distribution, the continuous distribution was better approximated with at least eight discrete rate categories. Although the effect of rate model on the estimation of topology was difficult to assess across all data sets, it appeared relatively minor between the unequal-rates models for the one data set examined carefully. As in molecular analyses, we argue that researchers should test and adopt the most appropriate model of rate variation for the data set in question. As discrete characters are increasingly used in more sophisticated likelihood-based phylogenetic analyses, it is important that these studies be built on the most appropriate and carefully selected underlying models of evolution.

HarrisonLarsson_AmongCharacterRates

This archive contains supporting materials for Harrison and Larsson (201X) Among-Character Rate Variation Distributions in Phylogenetic Analysis of Discrete Morphological Characters. SuppTable1.xlsx: Supplementary Table 1. List of data sets and TreeBase Accession Numbers SuppTable2.xlsx: Supplementary Table 2. Results of the main Bayesian analysis for ACRV model selection SupplementaryReferences.docx: Full references for the studies included in Supp. Table 1. MainAnalysis_MBControlFiles: MB Control files for the main analysis described in the text. NumberofRateCategoriesAnalysis_MBControlFiles: MB control files for the discrete rate category analysis. In the preceding two folders, all control files are indexed by Tree Base Accession number and by among-character rate variation model (and prior type). RScripts: folder containing R scripts (further explanation inside the .R files) 1. CombineLogExtractMCMC.R: script to process the output of MrBayes stepping-stone analysis to extract the initial MCMC portion (i.e. from the posterior) and summarize the posterior of these samples using MrBayes. 2. mb_to_beast_trees.sh: short BASH shell script to convert MrBayes formatted trees to BEAST-readable tree (used by #1, above). 3. PAUPOutputProcessor.R: R script to call PAUP* 4b10 (using Wine on Linux) and calculate per character change counts automatically given a data block (in special R format, please see the script file for details) and tree 4. PAUPLogProcessor.R: Simpler version of #3 to extract change counts from PAUP* log text file (but not call PAUP* directly). 5. S2128.rDa: the S2128 (TreeBaseID) data set in the correct R format for #3 6. S2128_Example.tree: a tree corresponding to the S2128 data set to test #3

Link to modified source code for MrBayes 3.2.2-r512-SVN

Source Code for MrBayes 3.2.2-r512-SVN. (http://mrbayes.sourceforge.net/) modified to include the lognormal model of Among-character Rate Variation. Distributed under the GPL license, with full credit to the developers of MrBayes. Please consult the README file in the archive.

Data from: Among-character rate variation distributions in phylogenetic analysis of discrete morphological characters

Data files

Abstract

Usage notes

HarrisonLarsson_AmongCharacterRates

Link to modified source code for MrBayes 3.2.2-r512-SVN

Works referencing this dataset