Data from: A new method for handling missing species in diversification analysis applicable to randomly or non-randomly sampled phylogenies
Data files
Jan 06, 2012 version files 20.03 KB
Abstract
Chronograms from molecular dating are increasingly being used to infer rates of diversification and their change over time. A major limitation in such analyses is incomplete species sampling that moreover is usually non-random. While the widely used γ statistic with the MCCR test or the birth-death likelihood analysis with the ∆AICrc test statistic are appropriate for comparing the fit of different diversification models in phylogenies with random species sampling, no objective, automated method has been developed for fitting diversification models to non-randomly sampled phylogenies. Here we introduce a novel approach, CorSiM, which involves simulating missing splits under a constant-rate birth-death model and allows the user to specify whether species sampling in the phylogeny being analyzed is random or non-random. The completed trees can be used in subsequent model-fitting analyses. This is fundamentally different from previous diversification rate estimation methods, which were based on null distributions derived from the incomplete trees. CorSiM is automated in an R package and can easily be applied to large data sets. We illustrate the approach in two Araceae clades, one with a random species sampling of 52% and one with a non-random sampling of 55%. In the latter clade, the CorSiM approach detects and quantifies an increase in diversification rate while classic approaches prefer a constant rate model, whereas in the former clade, results do not differ among methods (as indeed expected since the classic approaches are valid only for randomly sampled phylogenies). The CorSiM method greatly reduces the type I error in diversification analysis, but type II error remains a methodological problem.