Bayesian molecular dating is widely used to study evolutionary timescales. This procedure usually involves phylogenetic analysis of nucleotide sequence data, with fossil-based calibrations applied as age constraints on internal nodes of the tree. An alternative approach is tip-dating, which explicitly includes fossil data in the analysis. This can be done, for example, through the joint analysis of molecular data from present-day taxa and morphological data from both extant and fossil taxa. In the context of tip-dating, an important development has been the fossilized birth-death process, which allows non-contemporaneous tips and sampled ancestors while providing a model of lineage diversification for the prior on the tree topology and internal node times. However, tip-dating with fossils faces a number of considerable challenges, especially those associated with fossil sampling and evolutionary models for morphological characters. We conducted a simulation study to evaluate the performance of tip-dating using the fossilized birth-death model. We simulated fossil occurrences and the evolution of nucleotide sequences and morphological characters under a wide range of conditions. Our analyses of these data show that the number and the maximum age of fossil occurrences have a greater influence than the degree of among-lineage rate variation or the number of morphological characters on estimates of node times and the tree topology. Tip-dating with the fossilized birth-death model generally performs well in recovering the relationships among extant taxa, but has difficulties in correctly placing fossil taxa in the tree and identifying the number of sampled ancestors. The method yields accurate estimates of the ages of the root and crown group, although the precision of these estimates varies with the probability of fossil occurrence. The exclusion of morphological characters results in a slight overestimation of node times, whereas the exclusion of nucleotide sequences has a negative impact on inference of the tree topology. Our results provide an overview of the performance of tip-dating using the fossilized birth-death model, which will inform further development of the method and its application to key questions in evolutionary biology.
SUPPLEMENTARY APPENDIX 1
Details of the fossil occurrences sampled by fossil occurrence probability P and fossil recovery rate psi on the 20 birth-death species trees.
Supp_1_new.xlsx
SUPPLEMENTARY APPENDIX 2
Data generated by simulation and XML files used for all BEAST analyses.
Supp_2_new.tar.gz
SUPPLEMENTARY APPENDIX 3. Details of coverage probabilities from the core analyses for estimates of (a) origin time (tor); (b) root age (tmrca); and (c) crown age (tc).
Each column of panels shows the results from a different model of among-lineage rate variation for the molecular and morphological data: strict clock and strict clock (SS); strict clock and moderate rate variation (SM); and moderate rate variation and high rate variation (MH). Each row of panels shows the results from a different model of fossil occurrence probability (P = 0.01, 0.02, 0.05, and non-uniform). Within each panel, the y-axis represents the number of cases classified by whether the 95% credibility interval width contains the true value (dark grey) or not (light grey), while the x-axis represents the three different numbers of morphological characters.
Supp_3.pdf
SUPPLEMENTARY APPENDIX 4. Estimates of node times from total-evidence dating in our core analyses.
‘Slope’ denotes the slope of the line of best fit and ‘R’ denotes Pearson’s correlation coefficient. (a) Scatterplots showing the gamma statistics for the maximum-clade-credibility trees compared with those of the true trees. (b) Scatterplots showing the stemminess ranks for the maximum-clade-credibility trees compared with those of the true trees. In (a) and (b), each panel shows the results from a different models of among-lineage rate variation for the molecular and morphological data: strict clock and strict clock (SS); strict clock and moderate rate variation (SM); and moderate rate variation and high rate variation (MH). (c) Posterior medians in each maximum-clade-credibility tree for the youngest and median nodes defined in the corresponding true topology, plotted against the true values. Dates correspond to trees including fossil taxa, whereas those in (d) correspond to trees excluding fossil taxa.
Supp_4.pdf
SUPPLEMENTARY APPENDIX 5. Estimates of sampled ancestors in the core analyses.
(a) Absolute numbers of sampled ancestors (SA) in the maximum-clade-credibility trees for the fossils without extant descendants. (b) Ratios of the numbers of sampled ancestors to the true numbers of sampled ancestors. Each panel shows the results from a different model of among-lineage rate variation for the molecular and morphological data: strict clock and strict clock (SS); strict clock and moderate rate variation (SM); and moderate rate variation and high rate variation (MH). Within each panel, boxplot summaries are shown for the 20 FBD trees under each model of fossil occurrence probability (P = 0.01, 0.02, 0.05, and non-uniform). For each fossil occurrence probability, results are shown for three different sizes of morphological characters (l = 100, 200, 1000 from left to right, in increasingly dark shades of grey).
Supp_5.pdf
SUPPLEMENTARY APPENDIX 6. Accuracy and precision of date estimates for different species/FBD trees.
Estimates are reported for origin time (tor), root age (tmrca), and crown age (tc) in the core analyses. (a) Accuracy is measured by relative bias (distance between posterior median and true value, divided by the true value). (b) Precision is measured by relative 95% credibility interval (CI) width (posterior 95% CI width divided by the true value). Under each of the four fossil occurrence probabilities (P = 0.01, 0.02, 0.05, and non-uniform), boxplots are shown for each of the 20 species trees, ordered from left to right by increasing tor. Each boxplot summarizes the results from nine analyses for a single species tree.
Supp_6.pdf
SUPPLEMENTARY APPENDIX 7. Additional results for topological inferences and time estimates from analyses in which morphological data were excluded.
Supp_7.pdf
SUPPLEMENTARY APPENDIX 8. Additional results for topological inferences and time estimates from analyses in which molecular data were excluded.
Supp_8.pdf
SUPPLEMENTARY APPENDIX 9. Additional results for topological inferences and time estimates from analyses with fixed tree topologies.
Supp_9.pdf
SUPPLEMENTARY APPENDIX 10. Additional results from the posterior medians of the FBD model parameters net diversification rate (d), turnover rate (r), and fossil sampling proportion (s).
Supp_10.pdf