Branching patterns in phylogenies cannot distinguish diversity-dependent diversification from time-dependent diversification
Data files
Oct 20, 2020 version files 700.13 MB
Abstract
Methods
For more details please refer to the Methods section in the main text.
Simulation procedure
We simulated sets of 1,000 diversity-dependent (DD) and time-dependent (TD) phylogenetic trees using functions dd_sim and td_sim, respectively, from the R package DDD 4.3 (Etienne et al. 2012), available to download from CRAN
The two models share the same set of 3 parameters:
- λ0 (initial speciation rate)
- μ0 (constant extinction rate)
- K (carrying capacity)
We set λ0=0.8 and K=40, and considered four levels of extinction (μ0=0.1, 0.2, 0.3 or 0.4), and four different crown ages, or simulation times: 5, 10, 15, and 60 myr.
We then included an additional set with K = 80, and age = 15 myr.
Model selection
We fitted the DD and TD models on each DD or TD simulated tree to study whether phylogenetic trees generated by either model are indeed best fit by the model that generated them, or whether both models fit the data.
We used a maximum likelihood method to obtain the log-likelihood ratio (LLR) of DD versus TD for each tree.
The computation of both likelihoods (DD and TD) are implemented in functions dd_loglik and bd_loglik, respectively, in R package DDD 4.3.
We used the optimization routine implemented respectively in R functions dd_ML (DD) and bd_ML (TD) of the same package.
Initial parameter values were set to the true values to ensure relatively fast convergence of the likelihoods. Convergence however sometimes proved difficult, for example for large trees (i.e. more than a hundred tips), because the computation of the TD likelihood became challenging for trees of this size, and because of the presence of local optima in the likelihood landscape. In these cases, we initialized the optimization with a different value of K (the most influential parameter for the likelihood). First, TD trees were often larger than the carrying capacity would allow in DD. In instances where N > K', the likelihood of either model becomes 0 and we instead set the initial value of K to N' = N (λ0 - μ0) / λ0. Second, to avoid local optima, we started the optimization at K = N, which we had observed to often be close to the maximum likelihood estimate for other trees.
Empirical phylogenies
We applied the simulation-optimisation procedure described above to a set of empirical, recosntructed phylogenetic trees.
We took the set of Tetrapod family-level phylogenies compiled from published literature by Condamine et al. (2019) and selected five groups for which the linear diversity-dependent model with constant extinction (that is, the DD model we used for simulations) fitted best out of 26 birth-death models. The five groups included three bird families, Parulidae, Bucerotidae and Indicatoridae, and two mammal phylogenies, Canidae and Pseudocheiridae. Bird phylogenies were assembled by Condamine et al. from the bird phylogeny published by Jetz et al. (2012); and mammal phylogenies were pruned from the mammalian tree of Rolland et al (2014), itself built from the tree of Bininda-Edmonds et al. (2007).
For each group, we extracted the estimated parameter values for the DD model reported in Condamine et al. (2019) and used these as a starting point for fitting the TD model to each phylogeny. We then obtained the LLR distribution for each model by simulating 1000 DD and TD trees from the corresponding parameter estimates, and fitting both models to each simulated tree. We computed the decision thresholds as described in 2.4 and compared the LLR obtained for the original phylogeny to decide if DD, TD, or neither, could be selected.
Usage notes
As described in Methods, the data archived here were produced from functions in R package DDD.
Instead of calling the DDD functions mentioned above directly, the first author developed and used a utility R package for the project, called "DDvTDtools", that calls DDD functions and manages their input and output in a standard way. The package is available on GitHub (https://github.com/TheoPannetier/DDvTDtools), and can be cloned from there or installed from within R (remotes::install_github("TheoPannetier/DDvTDtools")). For instructions on how to run the simulations and obtain the maximum likelihood parameter estimates, I refer to the README.md file of this package.
Simulated trees are stored in files with the pattern simXX-PARA.RData, where XX is the model name (DD or TD) and PARA a four-digit code specifying parameter values (see ReadMe.txt for more details)
The results of the maximum likelihood procedure are stored in data frames, in files with the pattern simXX_optimYY_PARA_INITK.rds, where YY is the name of the model fitted to a tree (DD or TD, can be the same or different as XX), and INITK specifies initial parameter values (see ReadMe.txt for more details)
Data for empirical phylogenies follow the same naming pattern; simulated trees are stored in files with pattern simXX_FAMILY_FROM.RData, and the results of maximum likelihood are stored in files with pattern FAMILY_simXX_optimYY_FROM.rds, where FAMILY is the name of a family of birds or mammals, and FROM specifies which values the trees were simulated with (see ReadMe.txt for more details).