Data from: How to date a molecular phylogeny: Comparison of effective priors between node calibration and FBD
Data files
Mar 25, 2026 version files 423.51 KB
-
README.md
7.98 KB
-
SupplementaryMaterial.zip
415.53 KB
Abstract
Time-calibrating a phylogenetic tree is a fundamental step in phylogenetic inference, as it allows the study of macroevolutionary processes such as lineage diversification, trait evolution, and historical biogeography. To this end, the fossilized-birth-death (FBD) process, a stochastic process that coherently integrates fossils into phylogenies, is increasingly used as an alternative to traditional ad hoc node-calibration densities. However, the effective prior distribution on node ages induced by the FBD has never been investigated before, hindering an informed choice between the two approaches. Here, we analyze two empirical datasets (crocodylians and fireflies) by applying several models of time-calibration, including traditional node calibrations and FBD. We show that the effective node age priors induced by the FBD process in the absence of morphological data are comparable to those induced by uniform node calibrations with minimum equal to the age of the calibrating fossil, and maximum equal to the maximum age of the tree. Our exploration sheds light into how paleontological information is translated to node ages by the FBD process, and suggests that node calibration approaches remain an important alternative when the fossil record of the studied group is scarce and other prior information can be used to devise informative calibration densities.
Dataset DOI: 10.5061/dryad.0rxwdbsfj
Description of the data and file structure
This compressed file archive contains all data and scripts used in the study, as well as scripts to plot figures and tables with results (effective priors and posterior node ages for the Crocodylia and Lampyridae analyses).
Files and variables
File: SupplementaryMaterial.zip
Description:
data folder: contains data files used for the study
crocssubfolder: Contains all aligned molecular sequences used for the phylogenetic analysis of Crocodylia, in fasta format.crocs_extant.tre: Tree topology of extant Crocodylia in nexus format, used as constraint for the node calibration analyses.crocs_fossil.tre: Tree topology of extant + calibrating fossil Crocodylia in nexus format, used as constraint for the FBD analyses.crocs_taxa.csv: CSV table with ages (minimum ages in million years ago (Ma)) used for the FBD analysis with all fossils.crocs_taxa_Nfossils.csv: CSV tables with ages (minimum ages in million years ago (Ma)) used for the FBD analyses with subsamples of fossils (N = 10, 20, 50).firefliessubfolder: Contains all aligned molecular sequences used for the phylogenetic analysis of Lampyridae, in fasta format.fireflies_extant.tre: Tree topology of extant Lampyridae in nexus format, used as constraint for the node calibration analyses.fireflies_fossil.tre: Tree topology of extant + calibrating fossil Lampyridae in nexus format, used as constraint for the FBD analyses.startingTree_FBD_Multifossil.tre: Starting tree to initialize the FBD analysis of Crocodylia with all fossils included.startingTree_FBD_Multifossil_Nfossils.tre: Starting trees to initialize the FBD analysis of Crocodylia with subsamples of fossils (N = 10, 20, 50).Tip_dates_table.xlsx: Excel spreadsheet listing all extinct Crocodylia species used for the FBD analysis of Crocodylia with all fossils included and their ages. "Taxon" column lists species names, "Age_older" column lists maximum age in million years ago, "Age_younger" column lists minimum age in million years ago, "CalibratedClade" column indicates which of the crown calibrated clades does the extinct species belongs to.
NodeAges_Crocodylia.csv: CSV table with effective prior and posterior node ages of calibrated nodes for all Crocodylia analyses. The first column indicates the model used for the analysis (see main manuscript), other columns show the mean and 95% highest density interval of the age of selected nodes in million years ago (each column is a different node in the tree).
NodeAges_Lampyridae.csv: CSV table with effective prior and posterior node ages of calibrated nodes for all Lampyridae analyses. The first column indicates the model used for the analysis (see main manuscript), other columns show the mean and 95% highest density interval of the age of selected nodes in million years ago (each column is a different node in the tree).
scripts folder: contains scripts for running phylogenetic analyses and plotting results. All data necessary to run these scripts is included in the data folder. RevBayes version used: 1.3.2. R version used: 4.5.2.
1_run_mcmc_prior.sh: Bash script to run analyses without data (effective prior) in a high-performance computing cluster.2_run_mcmc_posterior.sh: Bash script to run analyses with data (posterior) in high-performance computing cluster.CladeDate_crocs.R: R script using CladeDate to generate node calibrations for Crocodylia based on the temporal distribution of fossil speciesclock_model_relaxed.Rev: RevBayes script for mixture of relaxed clocks on molecular branch rates.crocs_fossil_constraints_FBD_Multifossil.Rev: RevBayes script to set up topological constraints for the fossil tips used in the FBD analysis with all fossils included.crocs_fossil_constraints_FBD_Multifossil_Nfossils.Rev: RevBayes scripts to set up topological constraints for the fossil tips used in the FBD analyses with subsamples of fossils (N = 10, 20, 50).expand_fossil_data_partitioned.Rev: RevBayes script to add missing sequence data for fossil tips.ExtractNodeAgeTable_Crocodylia.R: R script to extract node ages from the output of analyses of the Crocodylia dataset.ExtractNodeAgeTable_Lampyridae.R: R script to extract node ages from the output of analyses of the Crocodylia dataset.mcmc_prior.Rev: RevBayes main script to run analyses under prior only (without data).mcmc_posterior.Rev: RevBayes main script to run analyses with data.node_calibrations_crocs_CladeDate.Rev: RevBayes script to set up calibrations for the Crocodylia analyses with CladeDate-derived node calibrations.node_calibrations_crocs_narrow.Rev: RevBayes script to set up calibrations for the Crocodylia analyses with narrow node calibrations.node_calibrations_crocs_wide.Rev: RevBayes script to set up calibrations for the Crocodylia analyses with wide node calibrations.node_calibrations_fireflies_narrow.Rev: RevBayes script to set up calibrations for the Lampyridae analyses with narrow node calibrations.node_calibrations_fireflies_wide.Rev: RevBayes script to set up calibrations for the Lampyridae analyses with wide node calibrations.node_tmrca_crocs.Rev: RevBayes script to specify calibrated nodes in the Crocodylia analyses.node_tmrca_crocs_FBD_Multifossil.Rev: RevBayes script to specify calibrated nodes in the Crocodylia analysis under FBD with all fossils included.node_tmrca_fireflies.Rev: RevBayes script to specify calibrated nodes in the Lampyridae analyses.partition_rates_Dirichlet_flat.Rev: RevBayes script to specify partition rate multipliers.phyloCTMC_nuc_part.Rev: RevBayes script that specifies a partitioned PhyloCTMC for multiple data subsets.PlotNodeAgesComparison_Crocodylia.R: R script to plot all node ages of one analysis compared to another analysis, for the Crocodylia dataset.PlotNodeAgesComparison_Fireflies.R: R script to plot all node ages of one analysis compared to another analysis, for the Lampyridae dataset.read_data_nuc_concatenated_codon.Rev: RevBayes script to read single gene data and concatenate together each codon position for all genes.read_data_nuc_concatenated_combined.Rev: RevBayes script to read single gene data and concatenate them together.root_prior_crocs_narrow.Rev: RevBayes script specifying the narrow root age prior for the Crocodylia analyses.root_prior_crocs_wide.Rev: RevBayes script specifying the wide root age prior for the Crocodylia analyses.root_prior_fireflies_narrow.Rev: RevBayes script specifying the narrow root age prior for the Lampyridae analyses.root_prior_fireflies_wide.Rev: RevBayes script specifying the wide root age prior for the Lampyridae analyses.sampling_fraction_crocs.Rev: RevBayes script specifying the extant species sampling fraction for the Crocodylia analyses.sampling_fraction_fireflies.Rev: RevBayes script specifying the extant species sampling fraction for the Lampyridae analyses.substitution_model_nuc_part.Rev: RevBayes script for substitution model applied to each data subset.tree_prior_BD.Rev: RevBayes script for birth-death tree priortree_prior_FBD.Rev: RevBayes script for fossilized-birth-death tree priortree_prior_FBD_Multifossil.Rev: RevBayes script for fossilized-birth-death tree prior in the analyses using multiple fossil tips per calibrated clade.ViolinPlotsPriorPosterior_Crocodylia.R: R script to plot half violin plots of both effective prior and posterior node ages for all the Crocodylia analyses.ViolinPlotsPriorPosterior_Fireflies.R: R script to plot half violin plots of both effective prior and posterior node ages for all the Lampyridae analyses.
