Skip to main content

Data from: Combining data sets with different phylogenetic histories


Wiens, John J. (2008), Data from: Combining data sets with different phylogenetic histories, Dryad, Dataset,


The possibility that two data sets may have different underlying phylogenetic histories (such as gene trees that deviate from species trees) has become an important argument against combining data in phylogenetic analysis. However, two data sets sampled for a large number of taxa may differ in only part of their histories. This is a realistic scenario and one in which the relative advantages of combined, separate, and consensus analysis become much less clear. I suggest a simple methodology for dealing with this situation that involves (1) partitioning the available data to maximize detection of different histories, (2) performing separate analyses of the data sets, and (3) combining the data but considering questionable or unresolved those parts of the combined tree that are strongly contested in the separate analyses (and which therefore may have different histories), until a majority of unlinked data sets supports one resolution over another. In support of this methodology, computer simulations suggest that (1) the accuracy of combined analysis at recovering the true species phylogeny may exceed that of either of two separately analyzed data sets under some conditions, particularly when the mismatch between phylogenetic histories is small and the estimates of the underlying histories are imperfect (few characters and/or high homoplasy), and (2) combined analysis provides a poor estimate of the species tree in areas of the phylogenies with different histories but an improved estimate in regions that share the same history. Thus, when there is a localized mismatch between the histories of two data sets, separate, consensus, and combined analysis may all give unsatisfactory results in certain parts of the phylogeny. Similarly, approaches that allow data combination only after a global test of heterogeneity will suffer from the potential failings of either separate or combined analysis, depending on the outcome of the test. Excision of conflicting taxa is also problematic in that it may obfuscate the position of conflicting taxa within a larger tree, even when their placement is congruent between data sets. Application of the proposed methodology to molecular and morphological data sets for Sceloporus lizards is discussed.

Usage notes