Data and code from: Evaluating the impact and detectability of mass extinctions on total-evidence dating
Data files
Apr 01, 2026 version files 244.22 MB
-
CodeData.zip
244.21 MB
-
README.md
6.22 KB
Abstract
Fossils are crucial for accurately dating phylogenetic trees because their ages provide vital constraints on the timing of macroevolutionary events, and their morphological characters offer key information on evolutionary rates and phylogenetic positions. The fossilized birth-death (FBD) process is a diversification model that incorporates both extant and extinct species, serving as a tree prior that seamlessly integrates fossils into phylogenetic inference. While the FBD model can account for mass extinctions, which caused rapid, widespread organismal loss, few studies have utilized FBD models incorporating these events in phylogenetic inference. This is likely because the detectability of mass extinctions and their impact on phylogenetic inference remain unclear. Through simulations, we assessed the influence of mass extinctions on divergence time and topology inference and evaluated the detectability of mass extinction signals in total-evidence dating. We examined three FBD tree priors: without mass extinction, with known mass extinction time and survival probability, and with known mass extinction time but unknown survival probability. Our results show that the FBD model with known mass extinction time and unknown survival probability was able to reliably detect mass extinctions when they occurred, and correctly refrained from detecting mass extinctions when they were absent. Moreover, different FBD models generate similar divergence time and tree topology errors. Even when the FBD model used for tree inference did not explicitly account for mass extinction events, signals of mass extinction were still detectable on the resulting MCC trees. The accuracy of the detection was similar to the one obtained from MCC trees inferred using an FBD model that includes mass extinction parameters. We also reduced the fossilization rate and the number of morphological characters, obtaining results consistent with the aforementioned findings. However, reducing the fossilization rate decreased the accuracy of detecting mass extinctions when they occurred, and reducing the number of morphological characters decreased the accuracy of divergence time inference. Furthermore, we adjusted the priors for the existence of mass extinction and the survival probability of mass extinction. We found that the prior for the existence of mass extinction had no effect on inference, whereas the prior for the survival probability of mass extinction significantly influenced both the detection of mass extinctions and the estimation of survival probabilities. Finally, we applied these models to empirical datasets of tetraodontiform fishes and crinoids and found that, consistent with our simulation results, the inclusion of a mass extinction event in the tree prior had a negligible impact on the inferred topologies and divergence times.
Scripts
simulation
Scripts for simulating phylogenetic trees and morphological and molecular characters.
- Simu_ME.R: Simulates phylogenetic trees along with morphological and molecular characters.
- sample_morph.R: Randomly samples 25 morphological characters from an initial set of 250.
- SimuRelaxedClock.R: Simulates relaxed clock and complex substitution model.
- Substitution.R: Summarize the expected number of substitutions per site and the number of invariant characters in the simulation
compare-ME-WME-Tree
Scripts for simulating trees with and without mass extinction, and comparing summary statistics.
- Compare-ME-WME-Tree.py: Calculates and compares summary statistics of the simulated trees.
- xml: Simulated XML files.
inference/
Scripts for performing inference using RevBayes, BEAST2, and MrBayes under FBD models.
Rev/simulation
- KnownTime: Inference using the FBD model with known mass extinction time (including scenarios without mass extinction).
- UnknownTime: Inference using the FBD model, with both mass extinction time and survival probability are unknown.
Rev/fixed
Inference of mass extinction and diversification rates at the MCC tree.
Rev/empirical
- crinoid: Inference using the FBD model with known mass extinction time on crinoid data (including scenarios without mass extinction).
plot
Scripts for data visualization and analysis.
- convergence.R: Computes ESS for total-evidence dating.
- BayesFactor.R: Calculates Bayes factors and survival probabilities of mass extinction for total-evidence dating, generating corresponding figures (Figures 1, 2).
- InferenceAccuracy.R: Calculates RF distances, divergence times, and diversification rates between inferred and true phylogenetic trees from total-evidence dating, generating corresponding figures (Figures 4, 5).
- mcc_convergence.R: Computes ESS for MCC inference.
- MCC_plot.R: Calculates Bayes factors, survival probabilities, and time of mass extinction using MCC trees, generating corresponding figures (Figures 6, 7).
- TED_MCC_plot.R: Compares differences in survival probabilities inferred from total-evidence dating and MCC trees, generating corresponding figures (Figure 3).
- TimeProb.R: Calculates Bayes factors, survival probabilities, and time of mass extinction from total-evidence dating using an FBD model with unknown time and survival probabilities of mass extinction, generating corresponding figures.
empirical-plot
Plotting Empirical Data
- convergence.R: Performs convergence diagnostics (calculating R-hat, ESS, ASDSF, etc.).
- CompareTree.R: Compares MCC trees inferred by models with and without mass extinction, as well as across different chains.
utlis
- used_funs.R: used functions
Data
crinoid
Source data for crinoid analysis.
output
simu_trees_CFBD_ME
Simulated phylogenetic trees with mass extinction.
- ori: Complete phylogenetic trees (including unsampled species).
- samp_0.03: Phylogenetic trees obtained with a sampling rate of 0.03 (excluding unsampled species), including morphological and molecular characters and fossil ages.
- extant_samp_0.03: Extant species generated by samp_0.03.
- samp_0.03_25: Same data as samp_0.03 but with morphological characters reduced to 25.
- samp_0.015: Phylogenetic trees obtained with a sampling rate of 0.015 (excluding unsampled species), including morphological and molecular characters and fossil ages.
- extant_samp_0.015: Extant species generated by samp_0.015.
- extant: Phylogenetic trees containing only extant species (fossils excluded).
- samp_0.03_relaxed: Phylogenetic trees obtained with a sampling rate of 0.03, relaxed clock, and complex substitution model (excluding unsampled species), including morphological and molecular characters and fossil ages.
simu_trees_CFBD_WME
Simulated phylogenetic trees without mass extinction. Structure identical to simu_trees_CFBD_ME.
ESS
- ess_list.RData: ESS values for the parameter with the lowest ESS from total-evidence dating.
- max_iter_list.RData: Number of MCMC iterations performed for total-evidence dating.
- ess_list_mcc.RData: ESS values for the parameter with the lowest ESS from MCC-based inference.
BayesFactor
- BayesFactor.RData: Bayes factors calculated from total-evidence dating.
- ME_Probability.RData: Mass extinction probabilities (1 − survival probability) inferred from total-evidence dating.
- ME_Probability_TED.RData: Mass extinction probabilities used for plotting Figure 3 and Figures S5–S7, inferred from total-evidence dating.
- ME_Probability_MCC.RData: Mass extinction probabilities used for plotting Figure 3 and Figures S5–S7, inferred from MCC-based inference.
InferenceAccuracy
Parameters of phylogenetic trees inferred through total-evidence dating.
- Rate_error_cov.RData: Coverage of diversification rates.
- coverage_div_time.RData: Coverage of divergence time.
- coverage_div_time_full.RData: Coverage of divergence time for the complete trees.
- rel_err_div_time.RData: Relative error of divergence time.
- rel_err_div_time_full.RData: Relative error of divergence time for the complete trees.
- RF_distance.RData: RF distance.
- RF_distance_full.RData: RF distance for the complete trees.
- net_rate.RData: net rates.
- sampling_rate.RData: sampling rates.
- death_rate.RData: extinction rates.
- birth_rate.RData: speciation rates.
- n_before.RData: Number of species before extinction.
MCC
Parameters inferred using MCC trees.
- bf.RData: Bayes factors calculated using MCC trees.
- log.RData: Log files from MCC-based MCMC inference.
- Rate_error_cov.RData: Coverage of diversification rates from MCC-based inference.
substitution
The expected number of substitutions per site and the number of invariant characters in the simulation
empirical
MCC tree of empirical data (Summary.tree is the MCC tree of the combined chains).
crinoid
- ME_443.8_1_fix_CFBD: MCC tree generated using the FBD model with fixed mass extinction.
- WME_CFBD: MCC tree generated using the FBD model with no mass extinction.
