TreeFlow: Probabilistic modelling and automatic differentiation for phylogenetics
Data files
Feb 23, 2023 version files 402.08 KB
-
README.md
1.57 KB
-
supplementary-data.zip
400.51 KB
Feb 23, 2023 version files 402.08 KB
-
README.md
1.57 KB
-
supplementary-data.zip
400.51 KB
Abstract
Probabilistic programming frameworks are powerful tools for statistical modelling and inference. They are not immediately generalisable to phylogenetic problems due to the particular computational properties of the phylogenetic tree object. TreeFlow is a software library for probabilistic programming and automatic differentiation with phylogenetic trees. It implements inference algorithms for phylogenetic tree times and model parameters, given a tree topology. We demonstrate how TreeFlow can be used to quickly implement and assess new models. We also show that it provides reasonable performance for gradient-based inference algorithms compared to specialized computational libraries for phylogenetics.
Christiaan Swanepoel, Mathieu Fourment, Xiang Ji, Hassan Nasif, Marc A Suchard, Frederick A Matsen IV, Alexei Drummond. "TreeFlow: probabilistic programming and automatic differentiation for phylogenetics". arXiv preprint arXiv:2211.05220 (2022).
Description of the data and file structure
There are two data analyses:
- 980 H3N2 sequences
- 62 carnivore mitochondrial DNA sequences
Sequence alignments, tree topologies, starting values, TreeFlow model definition files, and BEAST XMLs are in the respective subdirectories in the supplementary-data.zip archive.
Sharing/Access information
The sequence alignment for the H3N2 analysis has been removed because of license conflicts. This can be obtained from the supplementary material of the original publication:
Vaughan TG, Khnert D, Popinga A, Welch D, Drummond AJ. Efficient Bayesian inference under the structured coalescent. Bioinformatics. 2014 Aug 15;30(16):2272-9. doi: 10.1093/bioinformatics/btu201
The sequence alignment for the carnivores analysis was taken from the BEAST examples.
Code/Software
This data was generated by the pipeline at https://github.com/christiaanjs/treeflow-paper.
Requirements
- Carnivores sequence alignment accessed from benchmark in BEAST examples
- H3N2 sequence alignment taken from Vaughan TG, Kühnert D, Popinga A, Welch D, Drummond AJ. Efficient Bayesian inference under the structured coalescent. Bioinformatics. 2014 Aug 15;30(16):2272-9. doi: 10.1093/bioinformatics/btu201
Data processing pipeline can be found at https://github.com/christiaanjs/treeflow-paper
- Tree topologies inferred using RAxML 8.2.12
- Tree topologies are rooted using LSD 0.2
- BEAST analyses are performed using BEAST 2.6.7
- Variational inference analyses are performed using TreeFlow 0.0.1beta
Sequences have been removed H3N2 BEAST XML as a result of license conflicts. This complete version of this file is generated by the above pipeline.
