Testing relationships between multiple regional features and biogeographic processes of speciation, extinction, and dispersal
Data files
Jul 10, 2023 version files 513.36 MB
-
data.zip
4.44 MB
-
output.zip
508.91 MB
-
README.md
11.33 KB
Jul 23, 2024 version files 328.67 MB
-
code.zip
33.41 KB
-
data.zip
8.07 MB
-
output.zip
320.55 MB
-
README.md
12.69 KB
Oct 17, 2024 version files 329.42 MB
-
code.zip
33.53 KB
-
data.zip
8.07 MB
-
output.zip
321.31 MB
-
README.md
12.80 KB
Abstract
The spatial and environmental features of regions where clades are evolving are expected to impact biogeographic processes such as speciation, extinction, and dispersal. Any number of regional features (such as altitude, distance, area, etc.) may be directly or indirectly related to these processes. For example, it may be that distances or differences in altitude or both may limit dispersal rates. However, it is difficult to disentangle which features are most strongly related to rates of different processes. Here, we present an extensible Multi-feature Feature-Informed GeoSSE (MultiFIG) model that allows for the simultaneous investigation of any number of regional features. MultiFIG provides a conceptual framework for incorporating large numbers of features of different types, including categorical, quantitative, within-region, and between-region features, along with a mathematical framework for translating those features into biogeographic rates for statistical hypothesis testing. Using traditional Bayesian parameter estimation and reversible-jump Markov chain Monte Carlo, MultiFIG allows for the exploration of models with different numbers and combinations of feature-effect parameters, and generates estimates for the strengths of relationships between each regional feature and core process. We validate this model with a simulation study covering a range of scenarios with different numbers of regions, tree sizes, and feature values. We also demonstrate the application of MultiFIG with an empirical case study of the South American lizard genus Liolaemus, investigating sixteen regional features related to area, distance, and altitude. Our results show two important feature-process relationships: a negative distance/dispersal relationship, and a negative area/extinction relationship. Interestingly, although speciation rates were found to be higher in Andean versus non-Andean regions, the model did not assign significance to Andean- or altitude-related parameters. These results highlight the need to consider multiple regional features in biogeographic hypothesis testing.
Testing relationships between multiple regional features and biogeographic processes of speciation, extinction, and dispersal
This dataset is associated with Swiston & Landis 2023 (https://doi.org/10.1101/2023.06.19.545613). It contains two major elements. First is a simulation study performed using R and RevBayes associated with the validation of the MultiFIG model. Here, we provide simulated regional feature values, simulated phylogenetic trees and present-day species ranges, and output from analyses with the MultiFIG model. Second is an empirical analysis of the Liolaemus genus under the MultiFIG model. We also provide figures and supplemental figures associated with the manuscript.
Description of the data and file structure
Overview: The code .zip file contains all of the scripts necessary for generating and analyzing simulations for the simulation study (sim), running the empirical analysis of Liolaemus (emp), and plotting the results of both analyses (plotting). The data .zip file contains all of the simulated datasets for the simulation study in sim and empirical dataset for the analysis of Liolaemus in emp. The output .zip file contains all of the unprocessed and processed output from the simulation analyses (sim) and empirical analysis (emp), as well as all plots and supplemental figures from the associated manuscript (plots).
codeContains R and RevBayes code for MultiFIG simulation study and empirical analysis of Liolaemus, as well as plotting scripts.simContains scripts for generating and analyzing simulated datasets under the MultiFIG modelbatchsim.shShell script for submitting simulations using an LSF job scheduler (not required if simulating locally)sim.shShell script that calls separate scripts for simulating geographies and trees under the MultiFIG modelgeosim.RR Script for simulating "geographies", generating .csv data files indata/sim/georepresenting simulated regional features and feature summariessim.RevRevBayes script for generating phylogenetic trees, tip states, and model parameters indata/sim/historyaccording to the MultiFIG model (based on simulated geographies)batchinf.shShell script for submitting inference jobs on simulated datasets using an LSF job schedule (not required if performing inference locally)inf.shShell script that callsinf.Revon simulated datasetsinf.RevRevBayes script for performing inference on simulated datasets under the MultiFIG model
empContains RevBayes scripts for performing an analysis of Liolaemus under the MultiFIG modelbatchinf.shShell script for submitting inference jobs on Liolaemus dataset using an LSF job scheduler (not required if performing analysis locally)inf.shShell script that callsinf.Revon Liolaemus datasetinf.RevRevBayes script for performing inference on Liolaemus dataset under the MultiFIG model
plottingContains R scripts for plotting results of simulation study and empirical analysis of Liolaemuscov_plots.RR script for generating coverage plots and coverage table associated with the simulation studyjoint_plots.RR script for generating joint posterior plots associated with the analysis of Liolaemusposterior_plots.RR script for generating plots of Bayesian posteriors associated with the analysis of Liolaemusrj_sim_plots.RR script for plotting reversible jump results associated with the simulation studystate_plots.RR script for plotting the ancestral state reconstruction for Liolaemusvariance_plots.RR script for plotting the variance of the empirical features against the variances of the simulated features
dataContains data for simulation study and empirical analysis of LiolaemussimContains data associated with the simulation studygeoContains data files for regional features over a set of simulated geographies, as well as summary files which describe the features used in each analysis (some analyses use different features in simulation versus inference)- The first element of the filename represents whether the analysis will be performed using reversible-jump (RJ) or without (NONRJ)
- The second element of the filename represents the number of regions
- The third element represents the experimental condition of the analysis (all 12 features are generated, but some are not used)
- LESS: area & distance during simulation; area, distance, & altitude during inference
- FULL: area, distance, & altitude during simulation; area, distance, & altitude during inference
- MORE: area, distance, altitude, & temperature during simulation; area, distance, & altitude during inference
- NOISY: area, distance, & taltitude (true altitude) during simulation; area, distance, & altitude during inference
- The fourth element represents the tree size category: 25-49 = XSMALL, 50-99 = SMALL, 100-199 = MEDIUM, 200-349 LARGE (this information is not used in simulating geographies, but will be used in simulating trees)
- The fifth element represents the index of the simulated geography (randomly assigned)
SIM_feature_summary.csvfiles contain a list of features to be used in the associated simulationINF_feature_summary.csvfiles contain a list of features to be used in the associated inference- For all feature files:
- The fifth element represents the type of data (c=categorical, q=quantitative, w=within-region, b=between-regions)
- The sixth element is the geographical feature that the data represents: area, distance, altitude, true altitude, or temperature
- Eg.
3.FULL.MEDIUM.1.cb_distance.csvuses 3 regions, will use the feature set "FULL" when simulating a "MEDIUM" tree, is the simulated geography indexed 1, and contains a matrix of categorical distances between regions -- adjacency matrix
historyContains data files for simulated trees, tip states, and model parameters- Eg.
RJ.3.FULL.MEDIUM.1.tree.treContains a Newick-string representation of a dated phylogeny, simulated based on the 3-region geography indexed 1 using the feature set "FULL", and targeting a medium tree size, to be analyzed using reversible jump - Eg.
RJ.3.FULL.MEDIUM.1.data.tsvContains tip states corresponding to theRJ.3.FULL.MEDIUM.1simulation - Eg.
RJ.3.FULL.MEDIUM.1.param.txtContains MultiFIG model parameters that produced theRJ.3.FULL.MEDIUM.1simulation
- Eg.
empContains empirical dataset associated with Liolaemus and 6 South American regions (AA = Altiplanic Andes, CA = Central Andes, PA = Patagonia, CC = Central Chile, AD = Atacama Desert, EL = Eastern Lowland)historyContains the empirical dataset relating to Liolaemusliolaemidae.data.full.csvContains information about each species in the family Liolaemidae (including the genus Liolaemus), such as presence/absence in different regions, as well as other species traits that are not used for the MultiFIG analysis -- data from Esquerré et al. 2019make_dat.pyA Python script for translating the data fromliolaemidae.data.full.csvinto data that is usable for the MultiFIG model (liolaemidae.data.table.tsvandranges.data.tsv)state_labels_n6.txtRelates binary presence/absence data to integer state numbers used by RevBayes for the MultiFIG analysis, used by themake_dat.pyscriptliolaemidae.data.table.tsvContains present-day ranges of Liolaemus, representing presence in a region with (1) and absence with (0)ranges.data.tsvContains state numbers for present-day Liolaemus speciestree.mcc.treTime-calibrated phylogeny of Liolaemus
geoContains the empirical dataset relating to South AmericashapefilesContains shapefiles associated with the 6 regions used in the Liolaemus analysis, are not required for running the analysisRJ.6.HL.LIOLAEMUS.feature_summary.csvContains information about which features to use for the empirical analysis using highland/lowland classificationaltitudesContains regional features associated with altitudeandean_classification.csvClassifying regions as Andean or non-Andeanandean_sameness.csvMatrix describing whether regions share (1) or do not share (0) Andean classificationclassification.csvAltitude classification of regions (1=high, 0=low)mean.csvMean altitudes of regions (m)mean_diff.csvDifferences in mean altitude between regions (m)sameness.csvMatrix describing whether regions share (1) or do not share (0) altitude classificationsd.csvStandard deviation of altitudes of regions
areasContains regional features associated with areaareas.csvSizes of regions (km^2)classification.csvSize classification of regions (1=large, 0=small)
distancesContains regional features associated with distancemean.csvMean distances between regions (km)adjacency.csvMatrix of region adjacency
equalContains vectors/matrices of equal features values for simplified analyses (removing feature effects)- Eg.
q_equal_vector.csvA vector of equal feature values for simplified analysis -- in this case, a quantitative one-dimensional vector
- Eg.
outputContains output of simulated and empirical analysessimContians output of simulation studyoutputOutput of simulation analyses (logfiles for model parameters)dataSummaries of output for each analysis (estimates and HPD intervals)processed_dataContains large output file of the coverage analysis,coverages.csv
empContains output of empirical analysis of Liolaemus, including 6 file types:.ase.tre(ancestral state tree),.states.log(ancestral state trace),.stoch.log(stochastic mapping),.events.tsv(list of state transitions and cladogenetic events),.model_extras.log(by-region rates of biogeographic processes), and.model.log(logfile of model parameters)concatenated.model.logLogfile of model parameters used for results in the manuscript; created by concatenating the output of analyses 8 and 9 (numbering was arbitrary), after removing burnin -- done to ensure sufficient number of generationsoutputContains other output files associated with analyses 8 and 9, including the ancestral state reconstruction generated from analysis 9
plotsContains plots associated with Swiston & Landis 2023 (https://doi.org/10.1101/2023.06.19.545613)
Sharing/Access information
Links to other publicly accessible locations of the data:
Data was derived from the following sources:
- Esquerré, D., Brennan, I. G., Catullo, R. A., Torres-Pérez, F., & Keogh, J. S. (2019). How mountains shape biodiversity: The role of the Andes in biogeography, diversification, and reproductive biology in South America’s most species-rich lizard radiation (Squamata: Liolaemidae). Evolution; International Journal of Organic Evolution, 73(2), 214–230.
- NASA. (2013). Shuttle Radar Topography Mission (SRTM) Global [Data set]. https://doi.org/10.5069/G9445JDF
- Swiston, S. K., & Landis, M. J. (2023). Testing relationships between multiple regional features and biogeographic processes of speciation, extinction, and dispersal. BioRxiv. https://doi.org/10.1101/2023.06.19.545613
Code/Software
- The dataset contains .R files for generating simulated geographies and plotting output. These files were designed to be run using R version 4.4.0.
- The dataset also contains .Rev files for performing analyses in RevBayes. Correct versions of RevBayes and TensorPhylo can be found in Docker image
sswiston/rb_tp:7https://hub.docker.com/r/sswiston/rb_tp.
For a tutorial explaining the details of the MultiFIG analysis, visit https://revbayes.github.io/tutorials/multifig/.
Version Changes:
- 2024/07/22: Due to changes in RevBayes and TensorPhylo software, all scripts were overhauled, all analyses re-run, and all figures re-generated. New .zip files have been uploaded.
- 2024/10/17: The term 'altitude' was changed to 'elevation' throughout the plotting scripts and figures.
This dataset contains a simulation study performed using R and RevBayes. It consists of simulated regional feature values, simulated phylogenetic trees and present-day species ranges, and output from analyses with the MultiFIG model. The dataset also contains files relevant to a MultiFIG analysis of Liolaemus. Finally, the dataset contains PDF versions of figures and supplemental figures.
The data files can be opened with any text editor. For visualization, trees and logfiles can be opened in Tracer. Plots can be opened by any PDF reader.
- Swiston, Sarah; Landis, Michael (2023), Testing relationships between multiple regional features and biogeographic processes of speciation, extinction, and dispersal, , Article, https://doi.org/10.5281/zenodo.8060362
- Swiston, Sarah K.; Landis, Michael J. (2023), Testing relationships between multiple regional features and biogeographic processes of speciation, extinction, and dispersal, [], Posted-content, https://doi.org/10.1101/2023.06.19.545613
