Data from: Distinguishing punctuated and continuous-time models of character evolution for discrete characters and its implications for macroevolutionary theory
Data files
May 04, 2026 version files 205.50 MB
-
forPub_Final_2.zip
205.49 MB
-
README.md
11.07 KB
Abstract
The recent proliferation of quantitative models for assessing anatomical character evolution all assume that character change happens continuously through time. However, the punctuated equilibrium model posits that character change should be coincide with cladogenetic events, and thus should be tied to origination rates. Rates of cladogenesis are important to quantitative phylogenetics, but typically only for establishing prior probabilities in the tree model component of phylogenetic analyses. Here, we modify existing character-likelihood models to use the local cladogenesis rates from Bayesian analyses to generate amounts of character change over time dependent on origination rates, as expected under the punctuated equilibrium model. In the case of strophomenoid brachiopods from the Ordovician, Bayesian analyses strongly favor punctuated models over continuous-time models, with elevated rates of cladogenesis early in the clade’s history inducing frequencies of change despite constant rates of change per speciation event. This corroborates prior work proposing that the early burst in strophomenoid disparity simply reflects elevated speciation rates, which in turn has implications for seemingly unrelated macroevolutionary theory about whether early bursts reflect shifts in intrinsic constraints or empty ecospace. Future development of punctuated character evolution models should account for the full durations of species, which will provide a test of continuous change rates. Ultimately, continuous change versus punctuated change should become part of phylogenetic paleobiology in the same way that we currently test other models of character evolution.
Dataset DOI: 10.5061/dryad.dfn2z35f3
Description of the data and file structure
RevBayes code plus example data sets for conducting Bayesian phylogenetic analyses with either punctuated (= pulsed, speciational, etc.) or continuous (gradual, Darwinian, phyletic, etc.) character change.
Files and folders
The zip file (forPub_Final_2.zip) contains two folders.
RevBayes
The RevBayes folder contains the code relevant to the analyses of the paper. Note that the scripts and datasets are partitioned into folders. The main script calls upon scripts and datasets within these folders; if the directory is set to the RevBayes folder when running RevBayes and this folder structure is maintained, then it should not be necessary to alter the directory commands pointing to the relevant files and scripts.
scripts: This folder contains two "main" programs. They are essentially the same, but initialized to run either continuous time or punctuated analyses based on the variable "punctuated". If punctuated is set to FALSE (pps_Mk_Model.Rev), then the likelihood component of the analysis uses the Mk model outlined by Lewis (2001). This represents continuous-time change (i.e., "phyletic gradualism" sense Eldredge & Gould 1971). If punctuated==TRUE (pps_punc_Model.Rev), then it calls upon scripts/branch_rates/Accersi_Expected_Branchings.Rev, which will use the speciation rates to determine the expected amount of change along a branch.
scripts/standard_routines. This folder contains two scripts that provide more general parameterizations and initializations for RevBayes analyses. Milgram_Default_Settings.Rev provides basic initializations for RevBayes. Most of these will not be called upon in anyone analysis, but these scripts are designed to be flexible in order to allow for a variety of clock and diversification dynamic studies beyond the scope of this particular study. Accersi_Parameters_for_Analysis_Partitioned_by_States_and_Ordering_and_Class.Rev is used to set up M2, M3, M4, etc., models for character with 2, 3, 4, etc. states. Note that this allows for both unordered and ordered Q matrices, although this study only uses unordered (standard) Mk matrices. Note also that the Q-matrices can be used by either the continuous time or punctuated routines.
scripts/FBD_scripts. The sole script here, Milgram_Skyline_N_Interval.Rev, initializes the FBD model to have varying rates of origination, extinction and sampling over different intervals of time. This is referred to as a "skyline model" in epidemiological studies and is typical of FBD studies that allow for variation in diversification & sampling over time. Note that we initialize these rates using "traditional" birth-death-sampling analyses of rhynchonelliform (~ "articulate") brachiopods from the Ordovician. However, the program can vary these to maximize topology probabilities over the course of the MCMC analyses.
scripts/branch_rates. Because the punctuated model is not yet hard-coded into RevBayes, we provide Accersi_Expected_Branchings.Rev. This calculates the expected number of branching events over any interval of time, allowing for rate shifts when branches span 2+ chronostratigraphic intervals. This then is used to rescale expected change in each interval to be proportional to the speciation rate in that interval.
data. This folder contains the standard files required for FBD analyses with RevBayes. Strophomenida_fossil_intervals_w_outgroup.tsv provides the lower and upper bounds for the first occurrences of the oldest species in each analyzed genus, based on the timescale of Gradstein et al. (2020). Note that these usually are resolved to particular conodont or graptolite zones and thus much more restricted than just the first stages in which the taxa occur. Note also that this restricts only the first occurrence of the genus, i.e., the latest time by which the analyzed combination of anatomical character states evolved. Nearly all of the genera persist long after the latest time by which this morphotype evolved. Because we adapted scripts from older versions of RevBayes for these analyses (originally conducted in 2022), we divided the character matrix of Congreve et al. (2016) into matrices for the 2-state, 3-state and 4-state characters. More recent RevBayes scripts obviate the need for this, as they can partition the entire matrix based on criteria such as the number of states. However, we include the version that we used for the sake of replication.
output. These are the compressed output files from our analyses.
Adapting this code for your data:
Note that if you copy the entire RevBayes_Projects folder, then you can set the working directory to RevBayes_Projects, and your analyses will call upon the correct folders within your own directory. Thus, put your nexus and fossil_interval files into RevBayes_Projects/data. If you use our code as it is, then you will need to put in separate matrices for each character partition (e.g., number of states, ordering, etc.)
Within the main script, make the following changes:
Line 15: analysis_name <- "MY_ANALYSIS";
Line 18: taxa <- readTaxonData("data/MYDATA_fossil_intervals.tsv");
Lines 38:40:
partition_states <- v(2,3,4,...X) with X being the maximum number of states
partition_ordering <- v("unordered"… # of partitions); giving “ordered” or “unordered” for the particular partition
coding_bias <- v("variable","all","informative”…); .
“all”: autapomorphies and some invariant characters are present
“variable”: autapomorphies present but invariant characters must be estimated
“informative”: autapomorphies excluded so both autapomorphic and invariant characters must be estimated
Lines 55-56: These must be rewritten to denote the outgroup and ingroup members of your analysis.
Line 57: timeline must be rewritten to be appropriate for your data.
Line 58: seed_origination must be rewritten to be appropriate for your data. There are a wide variety of birth-death-sampling methods that allow you to get preliminary rates for the chronostratigraphic bins designated in timeline: any of these should suffice.
Line 59: seed_sampling also must be rewritten to be appropriate for your data. Again, a wide variety of methods exist for getting these initial estimates.
Lines 60-61: seed_sampling_lb and seed_sampling_ub. RevBayes currently does not allow for more sophisticated sampling rate estimates that take into account all the finds for a fossil taxon after its first occurrence. As a result, it can settle upon very unrealistic numbers. We recommend putting upper and lower bounds on the possible sampling rates based on empirical analyses until such time as sampling rates is actively included in the analyses.
Line 63: rho must be replaced with a value appropriate for your analyses. Note that this estimates the proportion of taxa sampled from the final interval, not the sampling rates in that interval. This can be estimated from sampling rates or from numbers of occurrences using estimators such as the Chao-2.
R_projects
We also a folder “R_Projects”, which includes the R-scripts that we use to summarize and illustrate the output from RevBayes.
RevBayes_Output_Summaries. Two scripts, Plotting_Results_for_GSA.r and RevBayes_Output_Summaries.rallow you to generate publication and talk quality illustrations of the output in the form of histograms and convergence scattergrams. This is little different but what can be done with programs such as Tracer, but the graphs are prettier.
Tree_Drawing. This includes programs for plotting “pretty” phylogenies against time scales. Draw_Me_a_Tree_for_Posting.r is the one used for our paper, but we include other versions that allow you to directly access data from the Paleobiology Database or offer more generalized tree plotings. These programs look into the folder tree for Newick Files to plot. We include the two maximum credibility trees from our continuous and punctuated analyses in this folder. The programs get information about the stratigraphic ranges of the illustrated taxa from the folder taxon_info. We include Strophomenoidea_Fuzzy_Ranges.csv. Note that this differs from the taxon_info.tsv files used by RevBayes in that it includes the lower and upper bounds of the last occurrence as well as the first occurrence. This also allows you to use different colors/shades to illustrate different groups of analyzed taxa (e.g., different families, different biogeographic distributions, different ecological types, etc.).
Data_for_R includes two RData databases compiled by one of us (PJW) that are used to illustrate the trees. Gradstein_2020_Augmented.RData includes information from Gradstein et al. (2020) to illustrate chronostratigraphic scales. Note that it is heavily augmented and includes estimated dates for many more biostratigraphic zone taxa and regional stages than does the original Gradstein et al. work. We also include Rock_Unit_Database.RData, which we use in part to provide the most exact possible ranges of first and last occurrence dates for taxa, both for analyses and illustration. This folder is called by Tree_Drawing.r; however, it can be used to estimate seed origination rates and seed sampling rates.
Common_R_Source. This includes numerous source files written by one of us (PJW) to summarize and illustrate the data.
Nexus_File_Routines.r. This includes a wide variety of routines to read not just Nexus files, but many types of Newick files, including the ones output by RevBayes. This is used extensively in Draw_Me_a_Tree_for_Posting.R
General_Plot_Templates.r. A wide range of specified plotting programs are included here, including ones to generate chronostratigraphic scales and plot phylogenies against those scales.
paleophylogeny_routines.r. Among other routines, it includes some for reading and summarizing the large log files output by RevBayes
Historical_Diversity_Metrics.r. This includes many methods for getting estimating origination, extinction and sampling rates given PBDB data.
The other files in this folder are necessary to compile the four described above.
For questions concerning the R-scripts, contact Peter Wagner. For questions concerning the RevBayes scripts, contact either Peter Wagner or April Wright.
Code/Software
These scripts require R (https://www.r-project.org) and RevBayes (https://revbayes.github.io). Both can be downloaded for free.
Access information
Data was derived from the following sources: Paleobiology Database (https://paleobiodb.org/#/)
Simply use these scripts as you would any RevBayes script. Do note that the primary code calls upon several external scripts and datasets, which are segregated into particular folders. The code assumes that these folders are in the "main" directory that RevBayes is using at that time.
