Skip to main content
Dryad

Data from: Experimental design in phylogenetics: testing predictions from expected information

Cite this dataset

San Mauro, Diego et al. (2012). Data from: Experimental design in phylogenetics: testing predictions from expected information [Dataset]. Dryad. https://doi.org/10.5061/dryad.83p1130j

Abstract

Taxon and character sampling is central to phylogenetic experimental design yet we lack general rules. Goldman introduced a method to construct efficient sampling designs in phylogenetics, based on the calculation of expected Fisher information given a probabilistic model of sequence evolution. The considerable potential of this approach remains largely unexplored. In an earlier study, we applied Goldman’s method to a problem in the phylogenetics of caecilian amphibians and made an a priori evaluation and testable predictions of which taxon additions would increase information about a particular weakly supported branch of the caecilian phylogeny by the greatest amount. Using mitogenomic and rag1 sequences (some newly determined for this study) from additional caecilian species we studied how information (both expected and observed) and bootstrap support varies as each new taxon is individually added, providing the first empirical test of specific predictions made using Goldman’s method for phylogenetic experimental design. Our results empirically validate the top three (more intuitive) taxon addition predictions made in our previous study, but only information results validate unambiguously the fourth (less intuitive) prediction. This highlights a complex relationship between information and support, reflecting that each measures different things: information is related to the ability to estimate branch length accurately, and support to the ability to estimate the tree topology accurately. Thus, an increase in information may be correlated with but does not necessitate an increase in support Our results also provide the first empirical validation of the widely held intuition that additional taxa that join the tree proximal to poorly supported internal branches are more informative and enhance support more than additional taxa that join the tree more distally. Our work supports the view that adding more data for a single (well chosen) taxon may increase phylogenetic resolution and support in weakly supported parts of the tree without adding more characters/genes while illustrating that less well chosen taxon additions can have the opposite effect. Altogether our results corroborate that, although still underexplored, Goldman’s method offers a powerful tool for experimental design in molecular phylogenetic studies. However, there are still several drawbacks to overcome, and further assessment of the method is needed in order to make it better understood, more accessible, and able to assess additions of multiple taxa.

Usage notes