Skip to main content
Dryad

Convergent evolution of giant size in eurypterids

Cite this dataset

Ruebenstahl, Alexander; Mongiardino Koch, Nicolás; Lamsdell, James; Briggs, Derek (2024). Convergent evolution of giant size in eurypterids [Dataset]. Dryad. https://doi.org/10.5061/dryad.cvdncjtbf

Abstract

Eurypterids, Paleozoic marine and freshwater arthropods commonly known as sea scorpions, repeatedly evolved to remarkable sizes (over 0.5 m in length) and repeatedly colonized continental aquatic habitats. We compiled data on the majority of eurypterid species and explored several previously proposed explanations for the evolution of giant size in the group including the potential role of habitat, sea surface temperature and dissolved sea surface oxygen levels using a phylogenetic comparative approach with a new tip-dated tree. Overall, there is no compelling evidence that the evolution of giant size was driven by temperature or oxygen levels, nor that it was coupled with the invasion of continental aquatic environments, latitude, or local faunal diversity. Eurypterid body size evolution is best characterized by rapid bursts of change that occurred independently of habitat or environmental conditions. Intrinsic factors likely played a larger role than previously recognized in determining the convergent origin of gigantism in eurypterids.

README: Convergent evolution of giant size in eurypterids

Included in this repository are all the data files associated with the manuscript "Convergent evolution of giant size in eurypterids" by Ruebenstahl et al.

The structure of this data is the following:

scripts

Four (4) R script files in '.R' format. These include:

i. timescale_MrBayes_posterior.R: a script that turns MrBayes posterior topology files ('.t') form tip-dated analyses into timescaled chronograms by dividing branch lengths of trees by their corresponding clock rate.
ii. find_diversification_epochs.R: takes a stratigraphic spreadsheet (in '.csv' format) including first and last appearance data (FAD, LAD), estimates diversity at regular intervals from it, and finds local minima and maxima as temporal points that might be associated with shifts in diversification rates.
iii. macroevolutionary_modelling.R: includes the bulk of functions and code to replicate all aspects of the analysis (from data loading to macroevolutionary inference and plotting).
iv. accesory_funtions.R: includes additional functions used but not loaded through packages.

comma-separated spreadsheets

Two (2) comma-separated spreadsheets in '.csv' format housing all data on which the analyses are based. These include:

i. environmental_predictors.csv: includes information on reconstructed temperature and dissolved oxygen taken from Song et al. (2019). Following is a description of the columns in this file:
a. Age: 1 million year time intervals used as temporal resolution
b. Mean_temperature: Mean paleotemperature in degrees Celsius, averaged from fossil oxygen isotope values, reported in Song et al. (2019) - Electronic Supplementary Material, Dataset 1. This includes several gaps for time windows not estimated by the authors.
c. Mean_dissolved_O2: Mean dissolved oxygen in shallow waters, in micromoles. Raw values were calculated by Song et al. (2019) and displayed in Figure 5 but not made available. These values were digitized from the curve shown in Figure 5 using WebPlotDigitizer v. 4.6 (https://automeris.io/WebPlotDigitizer.html) at 1 million year time intervals.

ii. simplified_Eurypterids_revised_5_22_2023.csv: includes information compiled for 146 eurypterid taxa, from both the primary literature and the PaleoBiology Database (PBDB; https://paleobiodb.org/#/). Following is a description of the columns in this file:
a. taxa: Name assigned to each eurypterid taxon and matching the terminal names in the phylogenetic matrices and trees.
b. size: Maximum reported body size expressed in cms. Empty cells indicate lack of a reliable maximum estimate of body length.
c. FAD: First (oldest) appearance datum obtained from the PDBD. Dates given in millions of years.
d. LAD: Last (youngest) appearance datum obtained from the PDBD. Dates given in millions of years.
e. paleolatitude: Reconstructed paleolatitude of the fossiliferous locality harboring each taxon obtained from the PBDB. Blank cells indicate unavailable data.
f. paleolongitude: Reconstructed paleolongitude of the fossiliferous locality harboring each taxon obtained from the PBDB. Blank cells indicate unavailable data.
g. depositional_environment: a group with three levels, including 'marine', 'marginal marine', and 'terrestrial'. Blank cells indicate unavailable data.
h. eurypterid_diversity: total diversity of other eurypterid species present in the same deposit. Blank cells indicate unavailable data.

i. faunal_diversity_(non_eury): total diversity (number of species) also present in the same deposit, excluding other eurypterids. Blank cells indicate unavailable data.
j. prey_diversity: total diversity (number of species) of potential preys also present in the same deposit. Requirements to count a taxon as a potential prey are listed in the Supplementary Material of the manuscript. Blank cells indicate unavailable data.
k. predator_diversity: total diversity (number of species) of potential predators also present in the same deposit. Requirements to count a taxon as a potential predator are listed in the Supplementary Material of the manuscript. Blank cells indicate unavailable data.
l. notes: further information supporting the data.
m. depositional_environment_references: literature consulted to characterize depositional environments harboring eurypterids as listed in column g.

Nexus

Two (2) morphological matrices in Nexus format ('.nex') used as input for MrBayes to produce tip-dated phylogenetic inferences under the fossilized birth-death (FBD) model. These include:
i. Eurypterids_constantFBD.nex: Inference under a constant-rate FBD model.
ii.Eurypterids_skyFBD.nex: Inference under a skyline FBD model with three distinct diversification epochs.

phylogenetic trees

Six (6) files containing phylogenetic trees in either Nexus or Newick file formatting. All of these can be visualized using FigTree. These include:
i. Eurypterids_constantFBD_consensus.tre: majority-rule consensus tree summarizing inference under a constant-rate FBD model, as output by MrBayes. A dummy terminal surviving until the present day was first removed.
ii. Eurypterids_skyFBD_consensus.tre: majority-rule consensus tree summarizing inference under a skyline FBD model, as output by MrBayes. A dummy terminal surviving until the present day was first removed.
iii. Eurypterids_skyFBD_mcc.nex: maximum-clade credibility tree summarizing inference under a skyline FBD model, obtained using TreeAnnotator, and used for both analysis and plotting. A dummy terminal surviving until the present day was first removed.
iv. Eurypterids_skyFBD_mcc_withdummy.nex: maximum-clade credibility tree summarizing inference under a skyline FBD model, obtained using TreeAnnotator, and used for both analysis and plotting. This file retains the dummy terminal.
v. Eurypterids_skyFBD_posteriorsample.tre: 500 randomly-sampled posterior topologies of the inference under a skyline FBD model, and used for all macroevolutionary inferences. A dummy terminal surviving until the present day was prunned from each.
vi. Eurypterids_skyFBD_posteriorsample_withdummy.tre: 500 randomly-sampled posterior topologies of the inference under a skyline FBD model, and used for all macroevolutionary inferences. This file retains the dummy terminal.

output files

Four (4) output files obtained with BayesTraits depicting optimal models of evolution of the ecological trait ('depositional_environment' column of file simplified_Eurypterids_revised_5_22_2023.csv) on the posterior sample of topologies from the skyline FBD analyses (stored in file Eurypterids_skyFBD_posteriorsample.tre). These include:
i. habitat.txt.Log_1.txt: Log file for run number 1 of the three-state ecological trait.
ii. habitat.txt.Log_2.txt: Log file for run number 2 of the three-state ecological trait.
iii. habitat_2states.txt.Log_1.txt: Log file for run number 1 of the recoded two-state ecological trait.
iv. habitat_2states.txt.Log_2.txt: Log file for run number 2 of the recoded two-state ecological trait.

Rdata

An Rdata file (final_results.Rda), storing results of all analyses. Given that some analyses take days of running time, this file can be loaded into R to replicate the plotting of figures and evaluate the performance of code.

Funding

National Science Foundation, Award: DEB-2036186