Skip to main content
Dryad

A Likelihood-Ratio Test for Lumpability of Phylogenetic Data: Is the Markovian Property of an Evolutionary Process Retained in Recoded DNA?

Data files

May 20, 2022 version files 1.38 MB

Abstract

Abstract In molecular phylogenetics, it is typically assumed that the evolutionary process for DNA can be approximated by independent and identically distributed Markovian processes at the variable sites and that these processes diverge over the edges of a rooted bifurcating tree. Sometimes the nucleotides are transformed from a 4-state alphabet to a 3- or 2-state alphabet by a procedure that is called recoding, lumping, or grouping of states. Here, we introduce a likelihood-ratio test for lumpability for DNA that has diverged under different Markovian conditions, which assesses the assumption that the Markovian property of the evolutionary process over each edge is retained after recoding of the nucleotides. The test is derived and validated numerically on simulated data. To demonstrate the insights that can be gained by using the test, we assessed two published data sets, one of mitochondrial DNA from a phylogenetic study of the ratites and the other of nuclear DNA from a phylogenetic study of yeast. Our analysis of these data sets revealed that recoding of the DNA eliminated some of the compositional heterogeneity detected over the sequences. However, the Markovian property of the original evolutionary process was not retained by the recoding, leading to some significant distortions of edge lengths in reconstructed trees.[Evolutionary processes; likelihood-ratio test; lumpability; Markovian processes; Markov models; phylogeny; recoding of nucleotides.]