Data from: LoRaD: Marginal likelihood estimation with haste (but no waste)
Data files
Feb 27, 2023 version files 102.48 KB
Abstract
The Lowest Radial Distance (LoRaD) method is a modification of the recently-introduced Partition-Weighted Kernel method for estimating the marginal likelihood of a model, a quantity important for Bayesian model selection. For analyses involving a fixed tree topology, LoRaD improves upon the Steppingstone or Thermodynamic Integration (Path Sampling) approaches now in common use in phylogenetics because it requires sampling only from the posterior distribution, avoiding the need to sample from a series of ad hoc power posterior distributions, and yet is more accurate than other fast methods such as the Generalized Harmonic Mean (GHM) method. We show that the method performs well in comparison to the Generalized Steppingstone method on an empirical fixed-topology example from molecular phylogenetics involving 180 parameters. The LoRaD method can also be used to obtain the marginal likelihood in the variable-topology case if at least one tree topology occurs with sufficient frequency in the posterior sample to allow accurate estimation of the marginal likelihood conditional on that topology.
Methods
loradML.zip
loradML.zip is a snapshot of the git repository at https://github.com/plewis/loradML
loradML is a stand-alone program that estimates marginal likelihood given standard output from diverse Bayesian MCMC software. It requires only sampled parameter vectors and the log of the unnormalized posterior for each parameter vector. Documentation is available via the Readme.md file (which is easier to view at the above respository web address).
lorad.zip
lorad.zip is a snapshot of the git repository at https://github.com/plewis/lorad
The software and data sets contained here may be used to recreate all analyses presented in the paper. Please see the _readme.txt files in each directory for specific instructions on how to compile software and use the scripts.
FRT2000rbcL.nex data set
The data set FRT2000rbcL.nex is located in the directory deploy-protosiphon, which will be created upon unzipping lorad.zip. This data set was originally used in the paper below and was obtained from one of the authors (Louise A. Lewis) who gave permission to include it here:
LA Lewis and FR Trainor. 2012. Survival of Protosiphon botryoides (Chlorophyceae, Chlorophyta) from a Connecticut soil dried for 43 years. Phycologia 51:662–665.
These data were used in a previous study in which marginal likelihoods were estimated for all possible tree topologies and the results combined to obtain the total marginal likelihood (total meaning that all possible topologies were taken into account). This "brute force" approach was used for comparison to a variant of the steppingtone method for marginal likelihood estimation in which a new tree topology reference distribution was proposed. This study was published as the following book chapter:
MT Holder, PO Lewis, DL Swofford, and D Bryant. 2014. Variable tree topology stepping-stone marginal likelihood estimation . Pp. 95-112 in: M-H Chen, L Kuo, and PO Lewis (eds.). 2014. Bayesian phylogenetics: methods, algorithms, and applications. Chapman & Hall/CRC Mathematical and Computational Biology.
In the current study, we re-estimated the marginal likelihood for many of the same models using the new LoRaD method.
S1679.nex data set
The data set S1679.nex is located in the directory deploy-kikihia, which will be created upon unzipping lorad.zip. This data set was originally compiled for the following study:
DC Marshall, C Simon, TR Buckley. 2006. Accurate branch length estimation in partitioned Bayesian analyses requires accommodation of among-partition rate variation and attention to branch length priors. Systematic Biology 55:993–1003.
Our group previously used these data to test the steppingstone marginal likelihood estimation method on partitioned data in this paper:
Y Fan, M-H Chen, L Kuo, and PO Lewis. 2011. Choosing among partition models in Bayesian phylogenetics. Molecular Biology and Evolution 28:523-532.
We use the same data set again in this paper in order to compare the new LoRaD method to the steppingstone approach used in the Fan et al. 2011 paper.
Sharing/Access information
The two zip files listed above (lorad.zip and loradML.zip) represent snapshots of the GitHub repositories below:
- loradML: https://github.com/plewis/loradML
- lorad: https://github.com/plewis/lorad
The FRT2000rbcL.nex data set represents an alignment of sequences deposited in GenBank by the original authors (LA Lewis and FR Trainor) under accession numbers JN880457–JN880466.
The data set S1679.nex was downloaded from https://treebase.org (study ID 1679).
Code/Software
Please refer to documentation provided in the directories created by unpacking the zip files for detailed instructions on how to compile the software and run the scripts.
Usage notes
All files are plain text files that can be opened with any text editor.