Simulations modelling the effect of budding speciation on phylogenies
Data files
Feb 27, 2025 version files 6.82 KB
-
completeness.csv
1.49 KB
-
foote_sepkoski_freqrats.csv
359 B
-
README.md
4.97 KB
Abstract
Paleobiologists have long sought to explain how alternative modes of speciation, including budding and bifurcating cladogenesis, shape patterns of evolution. Methods introduced over the past decade have paved the way for a renewed enthusiasm for exploring modes of speciation in the fossil record. However, the field does not yet have a strong intuition for how ancestor-descendant relationships, especially those that arise from budding speciation, might influence the shape of trees reconstructed for fossil or living clades. We developed a simulation approach based on classic paleobiological theory to ask what proportion of ancestral nodes in paleontological phylogenies are expected to correspond to sampled lineages under a range of preservational regimes. We compared our simulated results to empirical estimates of absolute fossil record completeness gathered from the literature and found that many fossilized clades of marine invertebrates are likely to display upwards of 80% sampled ancestors. Under a primarily budding model, phylogenies where 100% of the internal nodes correspond to named species are very possible for well-sampled clades at local and regional scales. We also leveraged our simulation approach to ask how budding might shape extant clades. We found that the ancestral signature of budding causes rampant hard polytomies in extant clades. Our results highlight how budding can yield dramatic and unrecognized effects on phylogenetic reconstruction of clades of both living and extinct organisms.
https://doi.org/10.5061/dryad.dr7sqvb6n
Description of the data and file structure
completeness.csv - completeness estimates for each fossil clade examined.
Columns are:
Taxon: Name of the clade
Completeness: Estimate of completeness, expressed as a percentage of total taxa inferred to have been sampled by the dataset, among all of the taxa that are inferred to have existed throughout the history of the clades.
Level: Taxonomic level at which the clade was sampled. All clades examined here were sampled either at the species or genus level.
Scale: Approximate estimation of the geographic scale at which the clade was sampled. Some were sampled globally, while others were sampled locally, at a single or small handful of nearby locality(ies)
Epoch: Geological interval for which the clade was sampled. "All" means that the clade was sampled over its entire existence.
Reference: Literature reference from which the completeness estimate (or FreqRat used to estimate completeness) was harvested.
foote_sepkoski_freqrats.csv - range-frequency ratios reported by Foote and Sepkoski (1999) and interpreted as preservation probabilities. these are used to estimate completeness in the terms expressed for the other datasets by integrating over a range of possible extinction rates.
Columns are:
Taxon: Name of the clade
Upper: Upper bound of completeness estimates derived and reported by Foote and Sepkoski (1999).
Lower: Lower bound of completeness estimates derived and reported by Foote and Sepkoski (1999).
Level: Taxonomic level at which the clade was sampled. All clades examined here were sampled either at the species or genus level.
Scale: Approximate estimation of the geographic scale at which the clade was sampled. Some were sampled globally, while others were sampled locally, at a single or small handful of nearby locality(ies)
Epoch: Geological interval for which the clade was sampled. "All" means that the clade was sampled over its entire existence.
Reference: Literature reference from which the completeness estimate (or FreqRat used to estimate completeness) was harvested.
Code/Software
Included python scripts can be used to rerun simulations and generate plots for budding and bifurcating speciation as well as for the extant polytomy analyses and empirical estimation of completeness metrics from the Foote and Sepkoski FreqRat data.
Description of scripts
sim_range_completeness.py -- simulates phylogenies and calculates the proportion of sampled ancestors after stochastically sampling the simulated trees using a Poisson model of fossil preservation.
calc_completeness.py -- functions to calculate completeness of fossil clades using several approaches, including Foote and Raup 'FreqRat' (1996) and Solow and Smith (1997) equations.
calc_foote_sepkoski_completeness.py -- script to marginalize over extinction rates to estimate completeness using values from Foote and Sepkoski (1999).
simulate phylogenetic tree replicates and produce figure 3 plots
budding
python sim_range_completeness.py bud
bifurcating
python sim_range_completeness.py bif
transform foote and sepkoski (1999) completeness values
python calc_foote_sepkoski_completeness.py foote_sepkoski_freqrats.csv
This will print the estimated completenesses to the terminal. These values, plus the other estimates taken directly from the literature, are compiled in the csv 'completeness.csv', which also includes references for each completeness estimate. These values can then be plotted in R using the script plot_completeness.R
estimate polytomies in simulated "extant" clades and plot results (this produces figure 5)
python get_timeslice_polytomies.py
Additional scripts/modules (all are dependencies)
node.py -- python module containing the basis of the tree object used for simulations
tree_reader.py -- python module for reading and writing phylogenetic trees to files
distributions.py -- python module implementing several statistical distributions. these are used for the stochastic birth-death simulations.
bd_fast.py -- implementation of the birth-death and sampling model. Functions in this module are called from sim_range_completenesses.py and get_timeslice_polytomies.py
supp.pdf -- additional PDF file of the supplemental information file containing supplemental figures (Figures S1-6) table (Table S1), and references.
References
Foote, M. and Raup, D.M., 1996. Fossil preservation and the stratigraphic ranges of taxa. Paleobiology, 22(2), pp.121-140.
Foote, M. and Sepkoski Jr, J.J., 1999. Absolute measures of the completeness of the fossil record. Nature, 398(6726), pp.415-417.
Solow, A.R. and Smith, W., 1997. On fossil preservation and the stratigraphic ranges of taxa. Paleobiology, 23(3), pp.271-277.
- Parins-Fukuchi, Charles Tomomi; Saulsbury, James G. (2025). Simulations modelling the effect of budding speciation on phylogenies. Zenodo. https://doi.org/10.5281/zenodo.14057806
- Parins-Fukuchi, Charles Tomomi; Saulsbury, James G. (2025). Simulations modelling the effect of budding speciation on phylogenies. Zenodo. https://doi.org/10.5281/zenodo.14057805
- Parins-Fukuchi, Charles Tomomi; Saulsbury, James G (2025). The Consequences of Budding Speciation on Trees. Systematic Biology. https://doi.org/10.1093/sysbio/syaf012
