Tens of thousands of phylogenetic trees, describing the evolutionary relationships between hundreds of thousands of taxa, are readily obtainable from various databases. From such trees inferences can be made about the underlying macroevolutionary processes, yet remarkably these processes are still poorly understood. Simple and widely used evolutionary null models are problematic: empirical trees show very different imbalance between the sizes of the daughter clades of ancestral taxa compared to what models predict. Obtaining a simple evolutionary model that is both biologically plausible and produces the imbalance seen in empirical trees is a challenging problem, to which none of the existing models provide a satisfying answer. Here we propose a simple, biologically-plausible macroevolutionary model in which the rate of speciation decreases with species age, while extinction rates can vary quite generally. We show that this model provides a remarkable fit to the thousands of trees stored in the online database TreeBase. The biological motivation for the identified age-dependent speciation process may be that recently evolved taxa often colonise new regions or niches and may initially experience little competition. These new taxa are thus more likely to give rise to further new taxa than a taxon that has remained largely unchanged and is therefore well-adapted to its niche. We show that age-dependent speciation may also be the result of different within-species populations following the same laws of lineage splitting to produce new species. As the fit of our model to the tree database shows, this simple biological motivation provides an explanation for a long standing problem in macroevolution.
Data-set empirical trees
The input file” treebase_input.rda” has the trees cached from TreeBase by the R-package treebase (Sanderson, M.J., Donoghue, M.J., et al. 1994). The file was later treated for specific statistics and compared with simulated trees, as described in out methodology. For that the “processing_treebase.R” R-script was used, being the file “treebaselist_output.RData” the final output. We would like to stress that this is not a complete catalogue of all R-scripts used in our study, especially regarding plotting and investigation of model behavior, once much was simulated with a cluster computer involving multiple outputs, code files and csv parameter tables that changed over time. If you plan to run similar experiments, we strongly encourage the reading of our TreeSimGM R-package help material.
data-set_empirical_trees.zip
Empirical trees parameter estimation based on simulated trees
“SBFUSED_input.RData” is the main input data that was generated from simulations using the central high-performance computer cluster of ETH Zurich (Brutus) and the R-package TreeSimGM (Hagen, O., Stadler, T. 2013). Inputs from “treebase_output.RData” (that can be found at “data-set_empirical_trees.zip”) are also used, once we use here empirical trees for the fitting. For more details please consult the respective R-package tutorials. All other files are the sub products of the “procedure_rscript.R” script. More details can be found at the methodology section. Figure 3 originates mainly from “SBep_output.RData”, “SBleast.square_output.RData” and “plot_script.R”.
We would like to stress that this is not a complete catalogue of all R-scripts used in our study, especially regarding plotting and investigation of model behavior, once much was simulated with a cluster computer involving multiple outputs, code files and csv parameter tables that changed over time. If you plan to run similar experiments, we strongly encourage the reading of our TreeSimGM R-package help material.
empirical_trees_parameter_estimation_based_on_simulated_trees.zip
Reliable estimates of shape on simulated phylogenies
Data and scripts used to test our procedure for parameter estimation over the three sets of simulated phylogenies (SI_Fig 6, 7, and 8). We would like to stress that this is not a complete catalogue of all R-scripts used in our study, especially regarding plotting and investigation of model behavior, once much was simulated with a cluster computer involving multiple outputs, code files and csv parameter tables that changed over time. If you plan to run similar experiments, we strongly encourage the reading of our TreeSimGM R-package help material.
reliable_estimates_of_shape_on_simulated_phylogenies.zip
Supplementary Figures 1-10
SupplementaryFigures1-10.pdf
Supplementary Analytic Results
SupplementaryAnalyticResults.pdf