Data for: Integrating fossils samples with heterogeneous diversification rates: a combined Multi-Type Fossilized Birth-Death model
Data files
Mar 05, 2026 version files 224.86 MB
-
MTFBD-SI.zip
224.86 MB
-
README.md
2.35 KB
Abstract
Birth-death models are widely used to describe the diversification process which leads to the observed species and phylogenies. When integrated into Bayesian phylogenetic inference, birth-death models allow the joint inference of the phylogeny and the diversification parameters from molecular information. Two major classes of extensions of the birth-death process are considered in this article. The first extends the phylogenetic tree to include fossil samples alongside extant species, allowing the inference to integrate information about the past diversity. This type of inference uses either morphological or taxonomic information to place fossils in the phylogeny. The second extension models diversification rates which can vary between lineages, and is thus able to infer patterns of variation in speciation or extinction rates. In this work, we combine these two types of extension into a multi-type fossilized birth-death (MTFBD) process, which can perform the joint inference of a phylogeny including extinct and extant samples, and lineage-specific diversification and fossil sampling rates in a Bayesian framework. The MTFBD model is implemented as part of the phylogenetic inference framework BEAST2. Using simulated and empirical datasets, we demonstrate the performance and accuracy of the new model compared to a model with rate heterogeneity but using only extant samples, and compared to a model without rate variation including fossils. We demonstrate that including fossils improves the accuracy of the phylogeny and diversification rates, especially extinction rates, provided that the inference includes detailed morphological information to accurately place the fossil samples. When this information is not available however, MTFBD estimates are strongly driven by the priors and are thus no better or even worse than estimates obtained only with extant samples. With informative fossil characters, the MTFBD model provides accurate phylogenies, and precisely characterizes how speciation, extinction and fossil sampling rates vary as diversification proceeds.
Supplementary materials for the MTFBD model in BEAST2
Authors: Joëlle Barido-Sottani (contact at joelle.barido-sottani@m4x.org) and Hélène Morlon
This dataset contains code and data files associated with the manuscript "Integrating fossils samples with heterogeneous diversification rates: a combined Multi-Type Fossilized Birth-Death model", published in Systematic Biology, 2026. This manuscript introduces a new birth-death model, which can be used in a Bayesian phylogenetic inference framework to estimate phylogenies containing both extant and extinct samples, along with lineage-specific birth, death and fossil sampling rates.
This dataset provides the code and data for the three main analyses presented in the manuscript: 1) the validation analysis which shows that the model is correctly implemented ; 2) the analyses on simulated data, which showed the accuracy of the new model in different conditions ; and 3) the analysis of an empirical dataset (Cetacea clade).
The RData data files and simulation and post-processing code are designed to be used in R (version >= 4.4). The XML files are input files to the phylogenetic inference software BEAST2, version 2.6. The analyses used the models FBD (Fossilized Birth-Death, with fossil samples and constant birth, death and sampling rates through the phylogeny), MTBD (Multi-Type Birth-Death, with lineage-specific birth, death and sampling rates) and MTFBD (Multi-Type Fossilized Birth-Death, with fossil samples and lineage-specific birth, death and sampling rates).
This dataset is provided as a complete archive MTFBD-SI.zip, and contains the following structure (see README inside of each folder for more details):
- empirical_datasets: code, data and results related to the MTFBD, MTBD and FBD inferences on the empirical Cetacea dataset
- simulated_datasets: code, data and results related to the MTFBD, MTBD and FBD inferences on simulated datasets
- validation: code, data and results used to validate the MTFBD implementation in BEAST2
- software: copy of the BEAST2 software and packages used to run the inferences. Note that these libraries are included purely for replication purposes ! New analyses should use the latest version of BEAST2 and the packages (available here https://www.beast2.org/).
