Skip to main content

Data from: Conserved genes, sampling error, and phylogenomic inference

Cite this dataset

Betancur-R., Ricardo; Naylor, Gavin J. P.; Ortí, Guillermo; Naylor, Gavin J.P. (2013). Data from: Conserved genes, sampling error, and phylogenomic inference [Dataset]. Dryad.


Disagreement or conflict among phylogenetic hypotheses obtained by analysis of large, genome-wide databases has incited debate over potential benefits, pitfalls, and best practices associated with phylogenomic approaches (Jeffroy, O., Brinkmann, H., et al. 2006, Philippe, H., Derelle, R., et al. 2009, Philippe, H., Brinkmann, H., et al. 2011). In a recent article, Salichos, L. and Rokas, A. (2013; S&R) assert that accuracy of phylogenetic inference from genomic data can be improved by focusing on the subset of genes that have “strong” phylogenetic signals as measured by bootstrap support of their inferred trees. In that study, S&R compared 23 yeast genomes and observed that genealogies obtained for 1070 orthologous genes were all different from each other and also differed from the topology obtained either by concatenating all genes or by an extended consensus phylogeny of all gene trees. They developed a new measure of incongruence (“internode certainty”) to gauge the level of conflict inherent in the data supporting specific internodes of the phylogeny. Based on this measure, S&R claim that slowly-evolving genes are a main source of conflict, suggesting that they should be avoided in favor of genes with strong phylogenetic signals. Their conclusion that strong signal reduces incongruence is drawn from the comparative phylogenetic analysis of protein alignments of the yeast genomes as well as from a reanalysis of published vertebrate and metazoan data. The notion that slowly-evolving genes are a bad choice to resolve basal nodes at deep phylogenetic levels is contrary to widespread practice in recent studies (e.g., Li, C., Orti, G., et al. 2007, Jian, S., Soltis, P.S., et al. 2008, Li, C., Lu, G., et al. 2008, Regier, J.C., Shultz, J.W., et al. 2008, Zhang, N., Zeng, L., et al. 2012, Lang, J.M., Darling, A.E., et al. 2013). We challenge S&R’s interpretations herein with new analyses of their yeast data. We first demonstrate that the high phylogenetic incongruence among conserved genes observed by S&R is likely an artifact due to sampling error. Secondly, we challenge their premise that bootstrap support is a reliable measure of historical signal of genes as it excludes systematic error as an alternative explanation of observed pattern, which has previously been shown to have consequences for phylogenetic analyses of yeast genomes (Collins, T.M., Fedrigo, O., et al. 2005). Finally, we note that S&R’s recommendations make the task of choosing genes upon which to base phylogenetic inferences impossible, as measures of phylogenetic signal can only be determined after data have been collected. Recommendations to focus on selective data partitions assessed by phylogenetic analysis may become relevant in the future, however, once complete genomes are available for all species of interest.

Usage notes