Data from: Random-effects substitution models for phylogenetics via scalable gradient approximations
Data files
May 20, 2026 version files 14.89 MB
-
Archive.zip
14.49 MB
-
README.md
1.78 KB
-
supp_mat.pdf
393.31 KB
Abstract
Phylogenetic and discrete-trait evolutionary inference depend heavily on an appropriate characterization of the underlying character substitution process. In this paper, we present random-effects substitution models that extend common continuous-time Markov chain models into a richer class of processes capable of capturing a wider variety of substitution dynamics. As these random-effects substitution models often require many more parameters than their usual counterparts, inference can be both statistically and computationally challenging. Thus, we also propose an efficient approach to compute an approximation to the gradient of the data likelihood with respect to all unknown substitution model parameters. We demonstrate that this approximate gradient enables scaling of sampling-based inference, namely Bayesian inference via Hamiltonian Monte Carlo, under random-effects substitution models across large trees and state-spaces. Applied to a dataset of 583 SARS-CoV-2 sequences, an HKY model with random-effects shows strong signals of nonreversibility in the substitution process, and posterior predictive model checks clearly show that it is a more adequate model than a reversible model. When analyzing the pattern of phylogeographic spread of 1441 influenza A virus (H3N2) sequences between 14 regions, a random-effects phylogeographic substitution model infers that air travel volume adequately predicts almost all dispersal rates. A random-effects state-dependent substitution model reveals no evidence for an effect of arboreality on the swimming mode in the tree frog subfamily Hylinae. Simulations reveal that random-effects substitution models can accommodate both negligible and radical departures from the underlying base substitution model. We show that our gradient-based inference approach is over an order of magnitude more time efficient than conventional approaches.
https://doi.org/10.5068/D1709N
- The file
supp_mat.pdfcontains supplementary text for the manuscript "Random-effects substitution models for phylogenetics via scalable gradient approximations'' by Magee et al. - The file
Archive.zipcontains datasets and analysis code, also available uncompressed at https://github.com/suchard-group/approximate_substitution_gradient_supplement.- Usage instructions:
- Create a folder named
approximate_substitution_gradient_supplement - Download
Archive.zipand place it inapproximate_substitution_gradient_supplement - Uncompress
Archive.zip, ensuring its contents are at the top level ofapproximate_substitution_gradient_supplement - Open and follow the instructions in
approximate_substitution_gradient_supplement/README.md
- Create a folder named
- Contents are described in detail in
READMEfiles (main or otherwise) in the archive, but briefly it includes:README.md: a top-level overview and instruction file, to be read firstpiBUSS/: a directory containing source code to perform the posterior predictive simulations in the manuscriptsimulations/: a directory containing code for creating the simulated datasets from the manuscript and for generating BEAST XMLs to analyze the simulated datasetsacknowledgements_table.xlsx: GISAID accession IDs for the SARS-CoV-2 genome sequences used in the SARS-CoV-2 analysis. Refer to the study originating this dataset, Pekar et al., 2021, for further details.
- Usage instructions:
