Skip to main content
Dryad

Data from: Estimating diversification rates on incompletely-sampled phylogenies: theoretical concerns and practical solutions

Data files

Dec 09, 2019 version files 4.14 GB

Abstract

Molecular phylogenies are a key source of information about the tempo and mode of species diversification. However, most empirical phylogenies do not contain representatives of all species, such that diversification rates are typically estimated from incompletely sampled data. Most researchers recognize that incomplete sampling can lead to biased rate estimates, but the statistical properties of methods for accommodating incomplete sampling remain poorly known. In this point of view, we demonstrate theoretical concerns with the widespread use of analytical sampling corrections for sparsely sampled phylogenies of higher taxonomic groups. In particular, corrections based on “sampling fractions” can lead to low statistical power to infer rate variation when it is present, depending on the likelihood function used for inference. In the extreme, the sampling fraction correction can lead to spurious patterns of diversification that are driven solely by unbalanced sampling across the tree in concert with low overall power to infer shifts. Stochastic polytomy resolution provides an alternative to sampling fraction approaches that avoids some of these biases. We show that stochastic polytomy resolvers can greatly improve the power of common analyses to estimate shifts in diversification rates. We introduce a new stochastic polytomy resolution method (TACT: Taxonomic Addition for Complete Trees) that uses birth-death-sampling estimators across an ultrametric phylogeny to estimate branching times for unsampled taxa, with taxonomic information to compatibly place new taxa onto a backbone phylogeny. We close with practical recommendations for diversification inference under several common scenarios of incomplete sampling.