Data from: Interaction of sequence data and paleogeographic priors in biogeographic dating: How could biological data inform time-constrained geological models?
Data files
Dec 17, 2025 version files 759.25 MB
-
input_data.zip
119.11 KB
-
README.md
3.17 KB
-
results_gene_flow.zip
446.60 MB
-
results_no_migration.zip
312.53 MB
Abstract
Biological and geographic patterns and processes are linked such that biological data hold information on paleogeographical patterns and processes. Here, we aimed to test Biogeographic dating as a methodological framework for the integration of biological, paleontological, and geological data to test paleogeographic hypotheses. In Biogeographic dating, both forms of data are specified and analyzed simultaneously. We evaluate how uncertainty and accuracy in both types of datasets affect the inference of divergence times used as a proxy for generating or testing geological models. We used data simulation to generate a paleogeographic scenario and a nuclear sequence dataset for lineages whose evolution is correlated with geological patterns. Then, gene flow was simulated across landscape units, such that biological patterns inferred from sequence data would deviate from simulated times of paleogeographic change. Under those two scenarios, we specified broad, incorrect, and accurate geological priors. These various scenarios were analyzed through Biogeographic dating analyses run in RevBayes and compared with our simulations. In doing so, we test the potential for well-calibrated phylogenies and the impact of accuracy and uncertainty in geological priors for estimating paleogeographic events.
This DRYAD repository contains data, scripts, and results for "Interaction of biological and paleogeographic priors in Biogeographic dating: How can biological data inform time-constrained geological models" published in the Journal of Biogeography. Descriptions of the content available in each folder and its generation are listed below.
Scripts (generation of data, Biogeographic dating analyses)
We link to Zenodo, where code used to generate simulated sequence data using Python and the package msprime provided. Additionally, scripts and results for running the Biogeographic dating analysis using RevBayes are provided.
demographic_simulation.py = run with msprime to generate sequence data without gene flow.
demographic_simulation_gene_flow.py = run with msprime to generate sequence data with gene flow.
fill_in_invariant_sites.py = Python script used to add randomly generated base pairs to invariant sites among lineages.
model_.rb = five unique scripts, run in RevBayes, with parameters corresponding to the accuracy and correctness of data described in the words following model_. Further information on each model is described in the publication.
combine.sh = Unix script used to combine the two generated MCMC output files with a burn in.
Input Data
Simulated sequence data from the previous steps and resulting phylogenies, along with files containing parameters used in Biogeographic dating analyses are provided (input_data.zip).
simulated_range.nex = lineage ranges (presence/absence)
simulated_sequences.nex = previously generated sequences from demographic simulations
modified_sequences_filled.nex = simulated_sequences.nex with invariant sites filled
simulated.connectivity.txt = connectivity between ranges over epoch times
simulated.distances.txt = distances between ranges
_times.txt = varying epoch times, used to establish priors in Biogeographic dating
Information regarding the selection of varying parameters is described in detail in the publication, and all parameter input can be modified in the analysis scripts available on Zenodo.
Results (no migration and gene flow)
The output of each individual MCMC run are provided in separate folders based on the presence (results_gene_flow.zip) or absence (results_no_migration.zip) of gene flow in demographic history generated by msprime. Results include the dated phylogenies and ancestral state estimations referred to in the publication. Within folders, results files ending FR indicates an enforced root age for the phylogeny, while runs lacking this did not have a strict root age provided in analyses. geo_unknown, incorrect_normal, incorrect_uniform, informed_normal, and informed_uniform refer to the paleogeographic prior used in the Biogeographic dating analysis. Each zip file also contains a combined_results folder, which contains the combined result files of the 2 MCMC runs using the same model after applying burn-in.
