Phylogenetic analysis of characters with dependencies under maximum likelihood
Data files
Aug 12, 2025 version files 54.50 KB
-
README.md
3.24 KB
-
SupplementaryMaterials.txt
51.26 KB
Abstract
The dependencies between characters used in phylogenetic analysis (e.g., inapplicabilities, functional dependencies) can be taken into account by using combinations of character states as possible ancestral morphotypes, and using appropriate rates of transformation between such morphotypes. As every morphotype represents a permissible combination of the original character states, this allows for easily ruling out specific combinations of character states, and taking into account changes that are either less or more likely to co-occur, or to occur in certain contexts. For inapplicable characters, Goloboff et al. (2021) used morphotypes but proposed obtaining transition probabilities between morphotypes from products of transition probabilities of the original characters and factors to incorporate dependencies. The product of transition probabilities is shown here to be flawed (failing the time-continuity requirement of phylogenetic Markov models, essential for statistical consistency under the model). Tarasov (2023) used the same delimitation of morphotypes but proposed obtaining transition probabilities from rate matrices, synthesized in a stepwise fashion from the hierarchy of dependencies. This paper shows that the rate matrices can easily be created, instead of with a stepwise synthesis, from direct comparisons between legitimate morphotypes (as done by Goloboff and De Laet 2023 for parsimony). Based on a few simple rules, the resulting rate matrices are (for inapplicable characters) identical to those obtained by Tarasov (2023). Additionally, in the computer program TNT, biological dependencies beyond mere inapplicability can be specified by the user with a simple syntax for (combinations of) states in “parent” characters restricting the states that “child” characters can take, using AND and OR conjunctions for elaborate interactions. These researcher-defined rules are used to internally convert the original characters into morphotypes, discarding morphotypes made impossible by the rules. In the case of biological dependencies (where, depending on the parent characters, there can be restrictions in the states that dependent characters can take, instead of the character being inapplicable), the rates of transition between morphotypes cannot be calculated solely from comparisons of states differing in both morphotypes –consideration of the conditions of dependency is needed as well.
https://doi.org/10.5061/dryad.p8cz8wb03
This package contains the Supplementary Materials, with additional technical details and discussion on implementation, plus test datasets and scripts.
Description of the data and file structure
All the materials needed to reproduce the datasets used in the simulations reported in the paper are in a single file, called Materials.zip. The contents of this ZIP file are:
a) SupplementaryMaterials.txt, containing technical details and implementation notes (complementing the descriptions and discussion in the main text). This supplementary material explains methods for modeling dependent morphological characters in phylogenetics using maximum likelihood. It compares two approaches to calculating transition probabilities—probability products versus rate matrix exponentiation—and supports the latter for consistency. It details how TNT software encodes interaction rules, filters valid morphotypes, calculates transition rates (including under covariation), and optimizes branch lengths. The document also covers implementation aspects, including simulations, handling inapplicables, and using TNT scripting for likelihood-based analyses.
b) C-scripts for calculating probability of a FALSE logical expression becoming TRUE, by non-superfluous replacement of 0's by 1's (mulprobyes.c), and probability of a TRUE expression becoming FALSE, by non-superlfuous replacement of 1's by 0's (mulprobnot.c). The scripts can be compiled with any standard C compiler, e.g. gcc or clang.
c) A TNT script, tests.run, with several example datasets, defining different types of dependencies, to illustrate the syntax.
d) A TNT script, ratsout.run, exemplifying how to access and export the morphotype(s) corresponding to each taxon in the scripting language of TNT, and how to export the rate matrix for all transitions between morphotypes.
e) A TNT script, depsimul.run, for generating and analyzing simulated datasets, with characters having different types of dependence. The script also produces SVG files with graphs. Entering TNT and typing "depsimul;" will display a brief description of the script and how to run it.
f) A Windows batch file, dothesimul.bat, to facilitate doing all the simulations reported in the paper. For this, place a copy of **tnt.exe **and the **depsimul **script in the same folder where **dothesimul **is, and double-click on the **dothesimul **icon. When all the runs finish (∼36 hs later, depending on machine speed), the 6 plates with results for the different simulations will be in the same directory, as SVG files.
g) A copy of the TNT binary, tnt.exe, for Windows (character-mode version). It requires no installation; just copy to your machine and use. Included here to document the exact same version used in the paper. Subsequent versions of the program TNT can be obtained from http://www.lillo.org.ar/phylogeny/tnt. The easiest way to run the TNT scripts included here is to place them in the same folder where TNT is running.
