Skip to main content
Dryad

Branching patterns in phylogenies cannot distinguish diversity-dependent diversification from time-dependent diversification

Cite this dataset

Pannetier, Theo; Martinez, César; Bunnefeld, Lynsey; Etienne, Rampal S. (2020). Branching patterns in phylogenies cannot distinguish diversity-dependent diversification from time-dependent diversification [Dataset]. Dryad. https://doi.org/10.5061/dryad.1jwstqjsx

Abstract

One of the primary goals of macroevolutionary biology has been to explain general trends in long-term diversity patterns, including whether such patterns correspond to an up-scaling of processes occurring at lower scales. Reconstructed phylogenies often show decelerated lineage accumulation over time. This pattern has often been interpreted as the result of diversity-dependent diversification, where the accumulation of species causes diversification to decrease through niche filling. However, other processes can also produce such a slowdown, including time-dependence without diversity-dependence. To test whether phylogenetic branching patterns can be used to distinguish these two mechanisms, we formulated a time-dependent, but diversity-independent model that matches the expected diversity through time of a diversity-dependent model. We simulated phylogenies under each model and studied how well likelihood methods could recover the true diversification mode. Standard model selection criteria always recovered diversity-dependence, even when it was not present. We correct for this bias by using a bootstrap method and find that neither model is decisively supported. This implies that the branching pattern of reconstructed trees contains insufficient information to detect the presence or absence of diversity-dependence. We advocate that tests encompassing additional data, e.g., traits or range distributions, are needed to evaluate how diversity drives macroevolutionary trends.

Methods

For more details please refer to the Methods section in the main text.

Simulation procedure

We simulated sets of 1,000 diversity-dependent (DD) and time-dependent (TD) phylogenetic trees using functions dd_sim and td_sim, respectively, from the R package DDD 4.3 (Etienne et al. 2012), available to download from CRAN

The two models share the same set of 3 parameters:

- λ0 (initial speciation rate)

- μ0 (constant extinction rate)

- K (carrying capacity)

We set λ0=0.8 and K=40, and considered four levels of extinction (μ0=0.1, 0.2, 0.3 or 0.4), and four different crown ages, or simulation times: 5, 10, 15, and 60 myr.

We then included an additional set with K = 80, and age = 15 myr.

Model selection

We fitted the DD and TD models on each DD or TD simulated tree to study whether phylogenetic trees generated by either model are indeed best fit by the model that generated them, or whether both models fit the data.

We used a maximum likelihood method to obtain the log-likelihood ratio (LLR) of DD versus TD for each tree.

The computation of both likelihoods (DD and TD) are implemented in functions dd_loglik and bd_loglik, respectively, in R package DDD 4.3.

We used the optimization routine implemented respectively in R functions dd_ML (DD) and bd_ML (TD) of the same package.

Initial parameter values were set to the true values to ensure relatively fast convergence of the likelihoods. Convergence however sometimes proved difficult, for example for large trees (i.e. more than a hundred tips), because the computation of the TD likelihood became challenging for trees of this size, and because of the presence of local optima in the likelihood landscape. In these cases, we initialized the optimization with a different value of K (the most influential parameter for the likelihood). First, TD trees were often larger than the carrying capacity would allow in DD. In instances where N > K', the likelihood of either model becomes 0 and we instead set the initial value of K to N' = N 0 - μ0) / λ0. Second, to avoid local optima, we started the optimization at K = N, which we had observed to often be close to the maximum likelihood estimate for other trees.

Empirical phylogenies

We applied the simulation-optimisation procedure described above to a set of empirical, recosntructed phylogenetic trees.

We took the set of Tetrapod family-level phylogenies compiled from published literature by Condamine et al. (2019) and selected five groups for which the linear diversity-dependent model with constant extinction (that is, the DD model we used for simulations) fitted best out of 26 birth-death models. The five groups included three bird families, Parulidae, Bucerotidae and Indicatoridae, and two mammal phylogenies, Canidae and Pseudocheiridae. Bird phylogenies were assembled by Condamine et al. from the bird phylogeny published by Jetz et al. (2012); and mammal phylogenies were pruned from the mammalian tree of Rolland et al (2014), itself built from the tree of Bininda-Edmonds et al. (2007).

For each group, we extracted the estimated parameter values for the DD model reported in Condamine et al. (2019) and used these as a starting point for fitting the TD model to each phylogeny. We then obtained the LLR distribution for each model by simulating 1000 DD and TD trees from the corresponding parameter estimates, and fitting both models to each simulated tree. We computed the decision thresholds as described in 2.4 and compared the LLR obtained for the original phylogeny to decide if DD, TD, or neither, could be selected.

Usage notes

As described in Methods, the data archived here were produced from functions in R package DDD.

Instead of calling the DDD functions mentioned above directly, the first author developed and used a utility R package for the project, called "DDvTDtools", that calls DDD functions and manages their input and output in a standard way. The package is available on GitHub (https://github.com/TheoPannetier/DDvTDtools), and can be cloned from there or installed from within R (remotes::install_github("TheoPannetier/DDvTDtools")). For instructions on how to run the simulations and obtain the maximum likelihood parameter estimates, I refer to the README.md file of this package.

Simulated trees are stored in files with the pattern simXX-PARA.RData, where XX is the model name (DD or TD) and PARA a four-digit code specifying parameter values (see ReadMe.txt for more details)

The results of the maximum likelihood procedure are stored in data frames, in files with the pattern simXX_optimYY_PARA_INITK.rds, where YY is the name of the model fitted to a tree (DD or TD, can be the same or different as XX), and INITK specifies initial parameter values (see ReadMe.txt for more details)

Data for empirical phylogenies follow the same naming pattern; simulated trees are stored in files with pattern simXX_FAMILY_FROM.RData, and the results of maximum likelihood are stored in files with pattern FAMILY_simXX_optimYY_FROM.rds, where FAMILY is the name of a family of birds or mammals, and FROM specifies which values the trees were simulated with (see ReadMe.txt for more details).

Funding

Dutch Research Council

Faculty of Natural Sciences, University of Stirling

Faculty of Natural Sciences, University of Stirling