Data from: Phylogenetic biogeography inference using dynamic paleogeography models and explicit geographic ranges
Data files
Aug 29, 2024 version files 13.06 GB
-
biblio.bib
-
README.md
-
sap-exp-full-infer-l39.zip
-
sap-exp-full-infer.zip
-
sap-exp-maps-full.zip
-
sap-exp-maps-no-motion.zip
-
sap-exp-no-land-infer.zip
-
sap-exp-no-motion-infer.zip
-
sap-exp-projects.zip
-
sap-exp-trim-infer.zip
-
sap-infer-l33.zip
-
sap-infer-ml.zip
-
sap-maps-l33.zip
-
sap-maps-ml.zip
-
sap-maps-terminals.zip
-
sap-projects.zip
-
sap-raw-geodata.zip
-
sim-muller-infer-lambda.zip
-
sim-muller-infer-particles.zip
-
sim-muller-no-land-infer-lambda.zip
-
sim-muller-no-land-infer-particles.zip
-
sim-muller-no-land-nodes.zip
-
sim-muller-no-land-pix.zip
-
sim-muller-no-land-project.zip
-
sim-muller-no-motion-infer-lambda.zip
-
sim-muller-no-motion-infer-particles.zip
-
sim-muller-no-motion-nodes.zip
-
sim-muller-no-motion-pix.zip
-
sim-muller-no-motion-project.zip
-
sim-muller-no-motion-want-unrot.zip
-
sim-muller-nodes.zip
-
sim-muller-pix.zip
-
sim-muller-project.zip
-
sim-muller-sim-lambda-trees.zip
-
sim-muller-sim-particles.zip
-
sim-muller-want.zip
-
sim-no-land-infer-lambda.zip
-
sim-no-land-infer-particles.zip
-
sim-no-land-nodes.zip
-
sim-no-land-pix.zip
-
sim-no-land-project.zip
-
sim-no-land-sim-lambda-trees.zip
-
sim-no-land-sim-particles.zip
-
sim-no-land-strat-infer-lambda.zip
-
sim-no-land-strat-infer-particles.zip
-
sim-no-land-strat-nodes.zip
-
sim-no-land-strat-pix.zip
-
sim-no-land-strat-project.zip
-
sim-no-land-want.zip
Abstract
Supplementary data for the manuscript entitled "Phylogenetic Biogeography Inference Using Dynamic Paleogeography Models and Explicit Geographic Ranges". This dataset contains the supplementary figures as well as the basic setup for the simulation used to test the method described in the manuscript,and an empirical example using the plant family Sapindaceae, analyzed with the program PhyGeo.
README: Phylogenetic Biogeography Inference Using Dynamic Paleogeography Models and Explicit Geographic Ranges
Supplementary data for the manuscript entitled "Phylogenetic Biogeography Inference Using Dynamic Paleogeography Models and Explicit Geographic Ranges". This dataset contains the supplementary figures as well as the basic setup for the simulation used to test the method described in the manuscript and an empirical example using the plant family Sapindaceae, analyzed with the program PhyGeo.
Files are stored mostly as zip files to reduce their size
and to mimic a directory structure.
Simulations with PhyGeo
Files prefixed with sim-
store files from the simulations.
Projects
A project is defined by a plate motion model and a landscape model, as well as the project files. The files contain the suffix -project.zip
.
Inside a zip file the projects are called project.tab
, The file terms50-100.sh
contains the bash script to run all the simulations and the inference.
Geographic data
The geographic data model is an equal area pixelation of the Earth, with 120 pixels in the equatorial ring.
Paleogeographic models
Three different models were used for the simulations and inferences. The first is a homogeneous sphere without any geographic features, which was used without time stages and with time stages every 5 million years. The second is a model with the current geographic landscape and time stages every 5 million years, but without any change in the landscape. The third is a full paleogeographic model, using the plate motion model from Müller et al. (2022) and the paleolandscape model of Cao et al. (2017) for the time stages between 0-400 Ma and from 405 to 540 from the PaleoMap model. The model has time stages every 5 million years.
nomotion-motion-120.tab
: Motion model for the homogeneous sphere (no motion).nomotion-landscape-120.tab
: Landscape model for the homogeneous sphere (two pixel types, both used with the same weight).nomotion-motion-120-5.tab
: Motion model for the homogeneous sphere with time slices each 5 million years (no motion).nomotion-landscape-120.tab
: Landscape model for the homogeneous sphere with time slices each 5 million years (two pixel types, both used with the same weight).muller-motion-120-5.tab
: This file contains the pixelated version of the plate motion model, with e120 pixelation, and time slices for each 5 million years, from 600 Ma to present.muller-landscape-cao-paleomap-120-5.tab
: This file contains the pixelated version of the paleolandscape model, with e120 pixelation, and time slices for each 5 million years, from 540 Ma to present.cao-landscape-nomotion-120-5.tab
: This file contains the pixelated version of the paleolandscape model, with e120 pixelation, and time slices for each 5 million years, from 540 Ma to present, but with the present time stage in all time stages.
Landscape pixel weights
For the homogeneous sphere model, the pixel weights are all identical.
Key | Weight | Environment |
---|---|---|
1 | 1.000 | ocean |
3 | 1.000 | land |
The landscape model for the full model and the model with the current landscape use the following pixel weights:
Key | Weight | Environment |
---|---|---|
1 | 0.001 | oceanic plateaus |
2 | 0.005 | continental shelf |
3 | 1.000 | lowlands |
4 | 1.000 | highlands |
5 | 0.001 | ice sheets |
no-landscape/model-pix-weights.tab
: This file contains the definition of the pixel weights used in the analysis without landscape.no-landscape/strat/model-pix-weights.tab
: This file contains the definition of the pixel weights used in the analysis without landscape.muller-model/model-pix-weights.tab
: This file contains the definition of the pixel weights used in the analysis with a full landscape.muller-model/no-motion/model-pix-weights.tab
: This file contains the definition of the pixel weights used in the analysis with the current landscape.muller-model/no-landscape/model-pix-weights.tab
: This file contains the definition of the pixel weights used in the analysis without landscape.
Simulation setup
All simulations are run using the same setup. The simulations use the program pgs
, which is available as part of PhyGeo. The simulations are divided into five time periods that correspond to geological periods: the Neogene (23-0 Ma), the Paleogene (66-23 Ma), the Cretaceous (145-66 Ma), the Jurassic (201-145 Ma), and the Triassic (251-201 Ma). At each time period, the diffusion is divided into three groups: slow ($\lambda$ between 100 and 1000), average (10-100), and slow (1-10). For each combination of time period and diffusion group, 50 trees were simulated.
For each simulated tree, a random starting root is selected for the specified time period, and then a particle evolves in the tree using a $\lambda$ value selected at random from the diffusion group. To produce the ancestral particles, 100 pixels are selected at random using the spherical normal, with a mean equal to the simulated particle and a $\lambda$ of 100 (this value is used because it produces geographic points that are similar to the observed ones). The pixel locations are conditioned by the pixel weight at the time period of the particle. The simulation produces three output files: files with suffix -lambda.tab
store the simulated $\lambda$ values; and files with suffix -trees.tab
store the simulated trees, These files are stored with the suffix -sim-lambda-trees.zip
. The third kind of file has suffix -particles.tab
and stores the simulated particles. The compressed files have the suffix -sim-particles.zip
.
The files used in the inference to test the results are a process of these particle files, which are prefixed as want-
, and contain the pixel frequencies of the simulated pixels at each node. The compressed files have the suffix -want.zip
. In the particular case of the inference with the current landscape, this frequency file was rotated to the present coordinates and stored with the prefix unrot-
, in this case the compressed file have the suffix want-unrot.zip
.
The inference phase produces a file suffixed -infer-lambda.tab
for the inferred $\lambda$ values and a file suffixed -infer-particles.tab
for the inferred particles. The same suffixes are used for the compressed files (but using the extension .zip
). To process the particle files, the particles are transformed into a continuous distribution using a spherical normal KDE with a $\lambda$ of 1000 that produces files stored with the prefix got-
. These files, the particle files and the KDE files are too large for the repository. The output results, are files prefixed pix-
, which is a table with the total number of correctly inferred pixels per node (suffix as -pix.zip
), as well as a graph with the proportion of retrieved nodes (prefixed nodes-
, compressed with the suffix -nodes.zip
).
Experiments
Homogeneous sphere
In the experiments using the homogeneous sphere models, the objective is to test the inference of the $\lambda$ parameter. The first experiment is stored with the prefix sim-no-land-
.
In the zip files prefixed as sim-no-land-strat
, the simulations used in the homogeneous sphere are used, but this time the inference was made with a time-stratified model (but not movement). The objective is to test if the results are modified by the use of an inference with time stratification (that is, by including internodes along the branches that cross each time stage).
Full paleogeographic model
In the experiments using the full paleogeographic models, the objective is to test the inference of the $\lambda$ parameter as well as the inference of the ancestral pixels. The first experiment is in the files prefixed sim-muller-
.
For the second experiment, stored with the prefix sim-muller-no-motion-
the same data generated in the simulation with the full model is used, but the inference model is a time stratified model without any change in the current landscape. As biogeographers using such a model would argue that the inferred pixels should be considered pixels in a particular location, independent of the time stage, the simulated pixels with the full model are rotated to the current coordinates.
For the third experiment, the data produced with the full data model was used, but the inference was made without any landscape (and therefore, without any motion). This experiment is stored with the files prefixed as sim-muller-no-land
.
Phylogenetic biogeography of Sapindaceae
This repository contains the data and results of a phylogenetic biogeography analysis of the plant family Sapindaceae using the computer program PhyGeo. The compressed files are prefixed as sap-
.
Source data
The data files used for the analysis are stored in the file sap-project.zip
.
Geographic data model
The geographic data model is an equal area pixelation of the Earth, with 360 pixels in the equatorial ring.
pixels-360.tab
: This file contains the pixel IDs and their associated geographic locations.
Paleogeographic model
The plate motion model is Muller et al. (2022). The paleolandscape model is based on an unrotated version of Cao et al. (2017) for the 0-400 Ma period, and an unrotated version of the PaleoMap model (Scotese and Wright 2018), for the period 405-540 Ma. Then the pixels were rotated using the Müller et al. (2022) plate motion model.
muller-motion-360-5.tab
: This file contains the pixelated version of the plate motion model, with e360 pixelation, and time slices for each 5 million years, from 600 Ma to present.muller-landscape-cao-paleomap-360-5.tab
: This file contains the pixelated version of the paleolandscape model, with e360 pixelation, and time slices for each 5 million years, from 540 Ma to present.
Phylogeny
The phylogenetic tree was built using the Sapindaceae branch from the phylogenomic analysis of the Sapindales by Joyce et al. (2023), which is quite similar in content (at genus level) to previous biogeographic analyses of the group (Buerki et al. 2011, 2013). As the original publication does not provide a machine-readable file, the relationships and ages were extracted manually from the figures. The phylogeny was augmented with a few terminals from Buerki et al. (2013), mostly to enlarge the sampling of a few genera and fossil taxa used as stem calibration points in Joyce et al. (2023) were added as sisters of the indicated clade. The species Matayba tenax was excluded, as it does not match any Maytaba species or synonym in the Plants of the World database, as this particular terminal float in a previous analysis (Buerki et al., 2021), and the genus Matayba did not appear as monophyletic in previous studies (Buerki et al., 2011, 2013).
The tree was then updated with the taxonomy from Plants of the World in the file term-taxonomy.tab
, removing synonyms from the tree.
data-tree.tab
: This file contains the phylogeny as a tab-delimited table.tree-joyce2023.svg
: This file contains a drawing of the phylogenetic tree.
A tree was edited to remove the four faster branches (Lecanodiscus, Podonephelium, Tina, and Toechima).
tree-trim.tab
: This file contains the phylogeny with the four faster branches removed.
Distribution records
The file sap-raw-geodata.zip
stores the raw geographic files.
Specimen data were obtained from a search of geo-referenced preserved specimens of Sapindaceae in GBIF. The initial number of records was 387.463 occurrences.
To process the raw occurrence records in GBIF, first a taxonomy using the terminal names was built using the GBIFer tool:
gbifer tax add --rank species --file term-taxonomy.tab < terminals.txt
Then the taxonomy is filled with all potential taxon names from the occurrence file from GBIF that are synonyms or sub-species of the names already in the taxonomy file:
gbifer tax match --file term-taxonomy.tab < occurrence.txt
The taxonomy file was edited to correct spelling errors and match the GBIF taxonomy with the taxonomy from the Plants of the World. This updated taxonomy, in the file term-taxonomy.tab
, is used to update the phylogenetic tree, removing synonyms from the tree.
The taxonomy file was used to extract country information from the specimen records:
gbifer country --tax term.taxonomy.tab < occurrence.txt > countries.tab
The resulting file countries.tab
was edited by removing the countries not explicitly defined in Plants of the World as native.
Then the occurrence table from GBIF was filtered using both the taxonomy file and the country file.
gbifer filter -tax term-taxonomy.tab -country countries.tab < occurrence.txt > occu-in-tree-geo.txt
Then the filtered points are converted into a file of points to be used with the taxRange tool:
gbifer export -tax term-taxonomy.tab < occu-in-tree-geo.txt > raw-gbif-records.tab
The filtered file, contains 68.307 occurrences.
As there are no geo-referenced specimen records for Euchorium cubense, a record file is created based on a material citation for the taxon.
Using the taxRange tool, the filtered GBIF records are transformed into a file
with presence pixels.
taxrange imp.points -e 360 -f text -o raw-points.tab raw-gbif-records.tab
taxrange imp.points -e 360 -f text -o raw-euchorium-points.tab raw-euchorium-records.tab
The resulting file is stored in raw-points.tab
. The directory terminals
stores the maps of the used distribution ranges.
Using the references given by Joyce et al. (2023), I added some fossil records to the file raw-fossil-records.tab
. Then these records were added to the points file after the project was created. Then fossil records were rotated to their past locations:
phygeo range add -type points -format text project.tab raw-fossils-records.tab
phygeo range rotate project.tab
Analysis
Landscape
Key | Weights | Environment |
---|---|---|
1 | 0.001 | oceanic plateaus |
2 | 0.005 | continental shelf |
3 | 1.000 | lowlands |
4 | 1.000 | highlands |
5 | 0.001 | ice sheets |
landscape-key.tab
: This file contains the keys for the landscape features of the paleolandscape model.model-pix-weights.tab
: This file contains the definition of the pixel weights used in the analysis.
Project
To set up a project, all input data is added to the project. Here the project is stored in the project.tab
file:
phygeo geo add -type geomotion project.tab muller-motion-360-5.tab
phygeo geo add -type landscape project.tab muller-landscape-cao-paleomap-360-5.tab
phygeo geo prior -add model-pix-weights.tab project.tab
phygeo tree add -f data-tree.tab project.tab data-tree.tab
phygeo range add -f data-points.tab -type points project.tab raw-points.tab
phygeo range add -type points project.tab raw-euchorium-points.tab
A project using the tree without the four faster branches was created in the same way and stored as project-trim.tab
.
Results
Inference files are defined with the prefix sap-infer-
.
Estimation
Maximum likelihood was estimated using the command diff ml
of PhyGeo
. The output log is stored in log-ml.txt
file:
phygeo diff ml project.tab > log-ml.txt
The maximum likelihood estimation of $\lambda$ was 19.5.
The same procedure was used to estimate the maximum likelihood with the trimmed tree, which was stored as log-trim-ml.txt
. The maximum likelihood estimation of $\lambda$ was 32.8.
To estimate the shape of the likelihood function, the command diff integrate
was used, estimating likelihood values for $\lambda$ between 0 and 50. The output is stored in log-file.txt
.
phygeo diff integrate -parts 100 -max 50 project.tab > log-like.txt
The same procedure was used for the project with the trimmed tree, but for $\lambda$ values between 0 and 100. The results were stored as log-trim-like.txt
.
To estimate the conditional likelihoods on each node, the command diff like
was used. For the final results, the $\lambda$ value used was the one estimated without the four fast branches but using the full tree. For the sake of completeness, the same procedure was also used for the maximum likelihood estimate of $\lambda$ with the full tree.
phygeo diff like -lambda 32.8 -o l33 project.tab
The stochastic map was performed using 10,000 particles, with the command diff particles
, and using the conditionals for a $\lambda$ value of 32.8. Again, for the sake of completeness, the same procedure was followed with the results of the maximum likelihood estimate.
phygeo diff particles -p 10000 -i l33-project.tab-joyce2023-32.800000-down.tab -o p-l33 project.tab
Outputs
the raw frequencies are calculated with the command diff freq
and posted as a compressed file freq-l33-project.tab.zip
.
phygeo diff freq -i p-l33-joyce2023-32.800000x10000.tab -o freq-l33 project.tab
The same procedure was used for the maximum likelihood estimate of $\lambda$,stored as freq-ml-project.tab.zip
.
For the output maps, a KDE using a spherical normal with lambda 1000 was built from the particle file. The same procedure was used for the maximum likelihood estimate.
phygeo diff freq -kde 1000 -i p-l33-joyce2023-32.800000x10000.tab -o kde-l33 project.tab
Maps are stored in files sap-maps-
. The maps use tree-joyce2023.svg
for the node numbers.
phygeo diff map -c 1440 -key landscape-key.tab -gray -i kde-l33-project.tab-p-l33-joyce2023-32.800000x10000.tab.tab -o "maps-l33-k95/l33-k95" project.tab
Other maps are also generated:
phygeo diff map -c 360 -key landscape-key.tab -gray -bound 0.5 -i kde-l33-project.tab-p-l33-joyce2023-32.800000x10000.tab.tab -o "maps-l33-k50/k50" project.tab
Maps for lineage richness are stored in the directories maps-l33-rich
and maps-l33-rich-u
for the maps using paleogeographic reconstructions and maps rotated to present time, respectively.
phygeo diff map -c 1440 -key landscape-key -gray -richness -i kde-l33-project.tab-p-l33-joyce2023-32.800000x10000.tab.tab -o "maps-l33-rich/l33-r" project.tab
phygeo diff map -c 1440 -key landscape-key -gray -richness -unrot -i kde-l33-project.tab-p-l33-joyce2023-32.800000x10000.tab.tab -o "maps-l33-rich/l33-ru" project.tab
Each map has the convention <type>-<tree>-n<node id>-<age>.png
, in which <type>
indicates the type of the reconstruction (for example l33-k95 for maps form $\lambda$ 32.8, and KDE of 95%), <tree>
indicates the tree (in this case joyce2023), <node id>
is the identifier of the node (that can be consulted in the file tree-joyce2023.svg
), and <age>
is the age in million years.
The speed is calculated with the command diff speed
, a tree with the speed of branches is stored as speed-l33-joyce2023.svg
, and a log file with the distances, the confidence interval, and the average velocity is stored in speed-l33.tab
.
phygeo diff speed -tree speed-l33 -step 5 -box 10 -i p-l33-joyce2023-32.800000x10000.tab project.tab > speed-l33.tab
The same procedure was performed with the maximum likelihood estimate, and the results stored as speed-ml-joyce.svg
and speed-ml.tab
.
Sapindaceae data exploration
Exploratory data analysis were stored in the files sap-exp-
. The files used the same nomenclature of the main analysis.
Source data
Geographic data model
The geographic data model is an equal area pixelation of the Earth, with 120 pixels in the equatorial ring.
pixels-ids-120.tab
: This file contains the pixel IDs and their associated geographic locations.
Paleogeographic model
Three paleogeographic models are used for the data exploration.
The full plate motion model is Muller et al. (2022). The paleolandscape model is based on an unrotated version of Cao et al. (2017) for the 0-400 Ma period, and an unrotated version of the PaleoMap model (Scotese and Wright 2018), for the period 405-540 Ma. Then the pixels were rotated using the Müller et al. (2022) plate motion model.
muller-motion-120-5.tab
: This file contains the pixelated version of the plate motion model, with e120 pixelation, and time slices for each 5 million years, from 600 Ma to present.muller-landscape-cao-paleomap-120-5.tab
: This file contains the pixelated version of the paleolandscape model, with e120 pixelation, and time slices for each 5 million years, from 540 Ma to present.
A time-staged model using the current landscape was based on the landscape from the Cao model at the present time stage, but without any movement between time stages and no changes in the geography.
no-motion-motion-120-5.tab
: This file contains a single plate that is immobile in all time stages. It uses e120 pixelation and time slices for each 5 million years, from 600 Ma to the present.cao-landscaoe.nomotion-120-5.tab
: This file contains the pixelated version of the paleolandscape model, with e120 pixelation, and time slices for each 5 million years, from 540 Ma to present, but using the present landscape for all time stages.
A model without any movement and any landscape (i.e., a homogeneous sphere) is also used.
no-motion-motion-120.tab
: This file contains a single plate and a single time stage. It uses e120 pixelation.no-motion-landscape-120.tab
: This file contains two landscape features defined for all pixels (used only for drawing) without any time stage. It uses e120 pixelation.
Phylogeny
The same phylogeny as in the main study was used.
data-tree.tab
: This file contains the phylogeny as a tab-delimited table.
The tree was edited to remove the four faster branches (Lecanodiscus, Podonephelium, Tina, and Toechima).
tree-trim.tab
: This file contains the phylogeny with the four faster branches removed.
Distribution records
The same distribution records used for the main study were used, but pixelated into an e120 pixelation.
data-points.tab
: The file with the pixelated records, using an e120 pixelation.
Analysis
Landscape
The landscape model for the full model and the model with the current landscape use the same weights as the main study.
Key | Weights | Environment |
---|---|---|
1 | 0.001 | oceanic plateaus |
2 | 0.005 | continental shelf |
3 | 1.000 | lowlands |
4 | 1.000 | highlands |
5 | 0.001 | ice sheets |
landscape-key.tab
: This file contains the keys for the landscape features of the paleolandscape model.model-pix-weights.tab
: This file contains the definition of the pixel weights used in the analyses with landscape.
For the homogeneous sphere, only two pixel types are defined; both were set with a weight of 1.0.
Key | Weight | Environment |
---|---|---|
1 | 1.000 | ocean |
3 | 1.000 | land |
noland-pix-weights.tab
: This file contains the definition of the pixel weights used in the analysis without landscape.
Project
Projects are created in the same way as in the main study.
project-120.tab
: Project for the full model.project-landscape-nomotion.tab
: Project for the current landscape.project-120-noland.tab
: Project for the homogeneous sphere.project-trim.tab
: The same asproject-120.tab
, but using the tree without the four faster branches.
Results
Estimation
The procedure for estimation is the same as used for the main study. Here only the likelihood estimates are reported,
as the resulting files are too large.
Project | MLE Lambda | LogLike |
---|---|---|
Full model | 20.0 | -1135.61 |
Current landscape | 17.1 | -1162.10 |
Homogeneous sphere | <0.005 | -1247.75 |
Trimmed tree | 39.0 | -1015.63 |
Output
The output is too large to be posted here, but as the general results from the full model are cursorily compared with the model with only the current landscape, the result from all nodes is posted here in the directories maps-full
(for the full model) and maps-curr
(for the current landscape).
References
References are also available as BiBTeX in the file biblio.bib
.
Buerki, S. et al. (2011) An evaluation of new parsimony-based versus parametric inference methods in biogeography: a case study using the globally distributed plant family Sapindaceae. Journal of Biogeography, 38, 531-550. DOI: 10.1111/j.1365-2699.2010.02432.x.
Buerki, S. et al. (2013) The abrupt climate change at the Eocene–Oligocene boundary and the emergence of South-East Asia triggered the spread of sapindaceous lineages. Annals of Botany, 112, 151-160. DOI: 10.1093/aob/mct106.
Buerki, S. et al. (2021) An updated infra-familial classification of Sapindaceae based on targeted enrichment data.
American Journal of Botany, 108, 1234-1251. DOI: 10.1002/ajb2.1693.
Cao, W. et al. (2017) Improving global paleogeography since the late Paleozoic using paleobiology.
Biogeosciences, 14, 5425-5439. DOI: 10.5194/bg-14-5425-2017.
GBIF.org (2023) GBIF occurrence download. DOI: 10.15468/dl.tjpzv2.
Joyce, E. M. et al. (2023) Phylogenomic analyses of Sapindales support new family relationships, rapid Mid-Cretaceous Hothouse diversification, and heterogeneous histories of gene duplication. Frontiers in Plant Science 14: 1063174.
DOI: 10.3389/fpls.2023.1063174
Müller, R. D. et al. (2022) A tectonic-rules-based mantle reference frame since 1 billion years ago – implications for supercontinent cycles and plate–mantle system evolution. Solid Earth, 12, 1127-1159. DOI: 10.5194/se-13-1127-2022.
PoWO (2023) Plants of the World Online. URL: http://www.plantsoftheworldonline.org/.
Scotese, C.S., Wrigth, N. (2018) PALEOMAP Paleodigital elevation models (PaleoDEMs) for Phanerozoic. URL: https://www.earthbyte.org/paleodem-resource-scotese-and-wright-2018/.
Methods
Paleogeographic model
The plate motion model is Muller et al. (2022). The paleolandscape model is based on an unrotated version of Cao et al. (2017) for the 0-400 Ma period, and an unrotated version of the PaleoMap model (Scotese and Wright 2018), for the period 405-540 Ma. Then the pixels were rotated using the Muller et al. (2022) plate motion model.
Phylogeny
The phylogenetic tree for the empirical dataset was built using the Sapindaceae branch from the phylogenomic analysis of the Sapindales by Joyce et al. (2023), which is quite similar in content (at genus level) to previous biogeographic analyses of the group (Buerki et al. 2011, 2013). As the original publication does not provide a machine-readable file, the relationships and ages were extracted manually from the figures. The phylogeny was augmented with a few terminals from Buerki et al. (2013), mostly to enlarge the sampling of a few genera. The species Matayba tenax was excluded, as it does not match any Maytaba species or synonym in the Plants of the World database, as this particular terminal float in a previous analysis (Buerki et al., 2021), and the genus Matayba did not appear as monophyletic in previous studies (Buerki et al. 2011, 2013).
Distribution records
Specimen data were obtained from a search of geo-referenced preserved specimens of Sapindaceae in GBIF. The initial number of records was 387.463 occurrences.
Data analysis
See the readme for the data analysis.