Phylogenetic accuracy under non-stationary and non-homogeneous conditions: A simulation study
Data files
Dec 30, 2025 version files 1.61 GB
-
README.md
2.21 KB
-
simulation_alignments.zip
1.61 GB
Abstract
Phylogenetic inference typically assumes that the data have evolved under Stationary, Reversible, and Homogeneous (SRH) conditions. Many empirical and simulation studies have shown that assuming SRH conditions can lead to significant errors in phylogenetic inference when the data violate these assumptions. Yet, many simulation studies focused on extreme non-SRH conditions that represent worst-case scenarios and not the average empirical dataset. In this study, we simulate datasets under various degrees of non-SRH conditions using empirically derived parameters to mimic real data and examine the effects of incorrectly assuming SRH conditions on inferring phylogenies. Our results show that maximum likelihood inference is generally quite robust to a wide range of SRH model violations but is inaccurate under extreme convergent evolution.
Overview
This repository contains simulation outputs and supplementary materials used in the study investigating the robustness of maximum-likelihood phylogenetic inference under violations of stationarity, reversibility, and homogeneity (SRH) assumptions.
The data include simulated sequence alignments generated under two alternative evolutionary simulation schemes, as well as supplementary figures and tables referenced in the manuscript.
Dataset DOI: 10.5061/dryad.k3j9kd582
Description of the data and file structure
simulation_alignments.zip
This archive contains all simulated sequence alignments and associated metadata used in the analyses.
Contents:
- Simulated nucleotide alignments generated under:
- the inheritance scheme, in which substitution processes evolve along the tree
- The two-matrix scheme, in which two monophyletic clades evolve under an alternative substitution process
- Alignments were simulated under varying parameter settings (e.g., number of sites, inheritance weights, Δ(Q) values), as described in the Methods section of the associated manuscript.
- File and directory names encode simulation conditions (e.g., scheme type, number of sites, parameter values).
These alignments were used as input for maximum-likelihood phylogenetic inference and subsequent evaluation of topological and branch-length estimation error.
Reproducibility
All scripts used to generate simulations, perform analyses, and produce figures are available on GitHub:
https://github.com/suhanaser/empiricalGTRdist
The combination of the scripts (GitHub), simulated alignments (this Dryad repository), and the Appendix allows full reproduction of the analyses reported in the paper.
Licensing
All data in this repository are released under the CC0 Public Domain Dedication, allowing unrestricted reuse, modification, and redistribution.
Contact
For questions regarding the data or analyses, please contact the corresponding author via the associated publication.
- Naser-Khdour, Suha; Minh, Bui Quang; Lanfear, Robert (2021). The Influence of Model Violation on Phylogenetic Inference: A Simulation Study [Preprint]. Cold Spring Harbor Laboratory. https://doi.org/10.1101/2021.09.22.461455
