Data from: A Bayesian supertree model for genome-wide species tree reconstruction
Data files
Oct 10, 2014 version files 39.44 MB
-
drosophila.txz
8.95 MB
-
Simulations_20140822.txz
28.83 MB
-
SupplFigures_20140822.pdf
1.67 MB
Abstract
Current phylogenomic data sets highlight the need for species tree methods able to deal with several sources of gene tree/species tree incongruence. At the same time, we need to make most use of all available data. Most species tree methods deal with single processes of phylogenetic discordance, namely, gene duplication and loss, incomplete lineage sorting or horizontal gene transfer. In this manuscript we address for the first time the problem of species tree inference from multilocus, genome-wide data sets in the presence of gene duplication and loss and incomplete lineage sorting, therefore without the need to identify orthologs or to use a single individual per species. We do this by extending the idea of Maximum Likelihood supertrees to a hierarchical Bayesian model where several sources of gene tree/species tree disagreement can be accounted for in a modular manner. We implemented this model in a computer program called guenomu whose input are with posterior distributions of unrooted gene tree topologies for multiple gene families, and whose output is the posterior distribution of rooted species tree topologies. We conducted extensive simulations under complex phylogenomic models in order to evaluate the performance ouf our approach in comparison with other species tree approaches able to deal with multilabeled trees. Our method ranked best, under both simulated and empirical data sets, in spite of ignoring branch lengths. Our results show in addition that under complex simulation scenarios , gene tree parsimony is also a competitive approach once we consider its speed in contrast to more sophisticated models.