Skip to main content

Data from: Assessing parameter identifiability in phylogenetic models using Data Cloning

Cite this dataset

Ponciano, José Miguel et al. (2012). Data from: Assessing parameter identifiability in phylogenetic models using Data Cloning [Dataset]. Dryad.


The success of model-based methods in phylogenetics has motivated much research aimed at generating new, biologically informative models. This new computer-intensive approaches to phylogenetics demands validation studies and sound measures of performance. To date such work has consisted only of simulation studies, estimation of known phylogenies and difficult mathematical analyses assessing the estimability of parameters. Little practical guidance has been available to practitioners and theoreticians alike as to when and why the parameters in a particular model can be identified reliably. Here, we illustrate how Data Cloning (DC), a recently developed methodology to compute the Maximum Likelihood estimates along with their asymptotic variance, can be used to diagnose structural parameter non-identifiability (NI) and distinguish it from other parameter estimability problems including the case where parameters are structurally identifiable, but are not estimable in given data set (INE), and the case where parameters are identifiable, and estimable, but only weakly so (WE). The application of the DC theorem uses well-known and widely used Bayesian computational techniques. With the DC approach, practitioners can use any Bayesian phylogenetics software to be able to diagnose non-identifiability. Theoreticians and practitioners alike now have a powerful tool to detect non-identifiability while investigating complex modeling scenarios, where getting closed-form expressions in a probabilistic study is complicated. Furthermore, here we also show how DC can be used as a tool to examine and eliminate the influence of the priors, in particular if the process of prior elicitation is not straightforward. Finally, when applied to phylogenetic inference, DC can be used to study at least two important statistical questions: assessing identifiability of discrete parameters, like the tree topology, and developing efficient sampling methods for computationally expensive posterior densities.

Usage notes