Skip to main content
Dryad

Data from: Are skyline plot-based demographic estimates overly dependent on smoothing prior assumptions?

Cite this dataset

Parag, Kris Varun; Pybus, Oliver; Wu, Chieh-Hsi (2021). Data from: Are skyline plot-based demographic estimates overly dependent on smoothing prior assumptions? [Dataset]. Dryad. https://doi.org/10.5061/dryad.1jwstqjs2

Abstract

In Bayesian phylogenetics, the coalescent process provides an informative framework for inferring changes in the effective size of a population from a phylogeny (or tree) of sequences sampled from that population. Popular coalescent inference approaches such as the Bayesian Skyline Plot, Skyride and Skygrid all model these population size changes with a discontinuous, piecewise-constant function but then apply a smoothing prior to ensure that their posterior population size estimates transition gradually with time. These prior distributions implicitly encode extra population size information that is not available from the observed coalescent data i.e. the tree. Here we present a novel statistic, Ω, to quantify and disaggregate the relative contributions of the coalescent data and prior assumptions to the resulting posterior estimate precision. Our statistic also measures the additional mutual information introduced by such priors. Using Ω we show that, because it is surprisingly easy to over-parametrise piecewise-constant population models, common smoothing priors can lead to overconfident and potentially misleading inference, even under robust experimental designs. We propose Ω as a useful tool for detecting when posterior estimate precision is overly reliant on prior choices.

Usage notes

This Matlab code was used to produce figures and results for the manuscript: "Are skyline plot-based demographic estimates overly dependent on smoothing prior assumptions?"

The main code can be used to compute the omega statistic presented in this paper. Key functions/scripts are: 

robustDecayTry.m: computes a p-correction for GMRF priors to omega

compLowerBndsDecay.m: computes the bound on omega^2, used for model rejection

modSelCurve.m, modSel_via_m.m, maxpModSel.m and omegaModelSelect.m: use to test model rejection algorithms under various conditions

robustDecay.m and testEllip.m: looks at how Fisher information from prior influences the omega statistic and looks at uncertainty ellipses

All files with the word 'King': computes omega statistic for Kingman coalescent

*Update: now includes xmls for the empirical-based simulations for the bison and HCV examples in the revised Fig 5 and 6 of the main text.

Funding

Medical Research Council, Award: MR/R015600/1