# Data generated and analysed for Shu, Valente & Etienne 2022 This (markdown) text document contains the metadata for the data repository accompanying the publication of Xie,S., Valente, L., & Etienne, R. S. (2022) currently in review. A previous version is available as a pre-print on biorXiv with the title: A simple island biodiversity model is robust to trait dependence in diversification and colonization rates _BiorXiv_https://www.biorxiv.org/content/10.1101/2022.01.01.474685v1. All files apart from metadata contained within this repository were obtained via computation at University of Groningen Peregrine High Performance Computing Cluster (HPCC). Data was generated using the pipeline implemented on the R package `DAISIErobustness` using the function `run_robustness()`, which itself greatly depends on the R package `DAISIE`. The code for these packages is version controlled on GitHub and is freely available in open-source repositories. See section References at the end of this document for appropriate links. # Folder structure # 1. data_2022_shu.rar: This data repository is organized in four sub-folders: * `Trait dependent` containing 1664 files. Data from this parameter space are generated with trait-dependent model with a single carrying capacity for each clade. * `Trait independent` containing 30 files, generated by using trait-independent (DAISIE) model. * `Two Ks` containing 188 files, considering different species limitation for species with different states. * `Without transition` containing 32 files, considering that species are not allowed to transfer between states. ## Results files Each data file was obtained by running one instance of `DAISIErobustness::run_robustness()` on the Peregrine HPCC. The files bash required for starting such jobs can be found inside the `bash/` folder in the `DAISIErobustness` package and GitHub repository. Each `.rds` file is named by the corresponding parameter space and parameter set and consists of one named list, thus it can be read by the base R `load()` function. Within each file there is only one list named `output`. This is a named list with the output of `DAISIErobustnesss::run_robustness()`. This consists of running the pipeline for 1000 replicates and resulting in 20 elements: * `$spec_nltt_error` a numeric vector with 1000 elements (one per replicate). Each value is the nltt error for species through time (deltaSTT) between a geodynamic simulation and an oceanic simulation. * `$num_spec_error` a numeric vector with 1000 elements (one per replicate) with the difference in the number of species at the end of the simulation between a geodynamic and an oceanic simulation. * `$num_col_error` a numeric vector with 1000 elements (one per replicate) with the difference in the number of colonists at the end of the simulation between a geodynamic and an oceanic simulation. * `$endemic_nltt_error` a numeric vector with 1000 elements (one per replicate). Each value is the nltt error for endemics through time (deltaESTT) between a geodynamic simulation and an oceanic simulation. * `$nonendemic_nltt_error` a numeric vector with 1000 elements (one per replicate). Each value is the nltt error for non-endemics through time (deltaNESTT) between a geodynamic simulation and an oceanic simulation. * `$spec_baseline_nltt_error` a numeric vector with 1000 elements (one per replicate). Each value is the nltt error for species through time (deltaSTT) between the first oceanic simulation and the second oceanic simulation. * `$num_spec_baseline_error` a numeric vector with 1000 elements (one per replicate) with the difference in the number of species at the end of the simulation between the first oceanic simulation and and the second oceanic simulation. * `$num_col_baseline_error` a numeric vector with 1000 elements (one per replicate) with the difference in the number of colonists at the end of the simulation between the first oceanic simulation and and the second oceanic simulation. * `$endemic_baseline_nltt_error` a numeric vector with 1000 elements (one per replicate). Each value is the nltt error for endemics through time (deltaESTT) between the first oceanic simulation and and the second oceanic simulation. * `$nonendemic_baseline_nltt_error` a numeric vector with 1000 elements (one per replicate). Each value is the nltt error for non-endemics through time (deltaNESTT) between a the first oceanic simulation and and the second oceanic simulation. * `$error_metrics` a named list with 10 elements: * `$num_spec_mean_diff` a numeric atomic vector with the absolute difference between the mean of the `$num_spec_error` vector and the mean of the `$num_spec_baseline_error` vector. * `$num_spec_sd_diff` a numeric atomic vector with the absolute difference between the standard deviation of the `$num_spec_error` vector and the standard deviation of the `$num_spec_baseline_error` vector. * `$num_col_mean_diff` a numeric atomic vector with the absolute difference between the mean of the `$num_col_error` vector and the mean of the `$num_col_baseline_error` vector. * `$num_col_sd_diff` a numeric atomic vector with the absolute difference between the standard deviation of the `$num_col_error` vector and the standard deviation of the `$num_col_baseline_error` vector. * `$spec_nltt_mean_diff` a numeric atomic vector with the absolute difference between the mean of the `$spec_nltt_error` vector and the mean of the `$spec_baseline_nltt_error` vector. * `$endemic_nltt_mean_diff` a numeric atomic vector with the absolute difference between the mean of the `$endemic_nltt_error` vector and the mean of the `$endemic_baseline_nltt_error` vector. * `$nonendemic_nltt_mean_diff` a numeric atomic vector with the absolute difference between the mean of the `$nonendemic_nltt_error` vector and the mean of the `$nonendemic_baseline_nltt_error` vector. * `$spec_nltt_sd_diff` a numeric atomic vector with the absolute difference between the standard deviation of the `$spec_nltt_error` vector and the standard deviation of the `$spec_baseline_nltt_error` vector. * `$endemic_nltt_sd_diff` a numeric atomic vector with the absolute difference between the standard deviation of the `$endemic_nltt_error` vector and the standard deviation of the `$endemic_baseline_nltt_error` vector. * `$nonendemic_nltt_sd_diff` a numeric atomic vector with the absolute difference between the standard deviation of the `$nonendemic_nltt_error` vector and the standard deviation of the `$nonendemic_baseline_nltt_error` vector. * `$passed_novel_mls` a list of up to 1000 elements, containing the output of successful MLE runs on geodynamic simulations. Only successful MLE are stored in this list, hence the size may be lower than 1000. Each list element is a data frame containing the estimated parameters, degrees of freedom and convergence flag. * `$failed_novel_mls` a list of up to 1000 elements, containing the output of failed MLE runs on geodynamic simulations. Only failed MLE are stored in this list, hence the size may be (and usually is much) lower than 1000. Each list element is a data frame containing the estimated parameters, degrees of freedom and convergence flag. * `$passed_oceanic_mls` a list of up to 1000 elements, containing the output of successful MLE runs on the first oceanic simulations. Only successful MLE are stored in this list, hence the size may be lower than 1000. Each list element is a data frame containing the estimated parameters, degrees of freedom and convergence flag. * `$failed_oceanic_mls` a list of up to 1000 elements, containing the output of failed MLE runs on the first oceanic simulations. Only failed MLE are stored in this list, hence the size may be (and usually is much) lower than 1000. Each list element is a data frame containing the estimated parameters, degrees of freedom and convergence flag. * `$failed_novel_sims` a list of up to 1000 elements, each element containing the geodynamic simulation output that caused MLE runs to fail. Only simulations which result in downstream MLE failure are stored in this list, hence the size may be (and usually is much) lower than 1000. * `$passed_oceanic_sims_1` a list of up to 1000 elements, each element containing the first set of oceanic simulation output that is passed to MLE and runs without issues. * `$passed_oceanic_sims_2` a list of up to 1000 elements, each element containing the second set of oceanic simulation output that is passed to MLE and runs without issues. * `$failed_oceanic_sims` a list of up to 1000 elements, each element containing the first set of oceanic simulation output that caused MLE runs to fail. Only simulations which result in downstream MLE failure are stored in this list, hence the size may be (and usually is much) lower than 1000. # 2. DAISIE.zip This folder contains the state-dependent and state-independent DAISIE simulation, as well as state-independent inference model codes to generate the data in the data_2022_shu folder. # 3. DAISIErobustness.zip This folder contains the code of running the pipeline. Besides, the folder also include the analysis code to generate the figures in the paper. # 4. Figure_estimation.zip This folder contains the figures which show comparison between the generating rates, estimation from state-dependent simulations (SII 1) and estimation from state-independent simulation (SII 2) for each parameter set. There is one-to-one correspondence between the name of each figure with the parameter sets in `trait_CES.rda`. # 5. Figure_estimation_error.zip This folder contains the comparison between the error (difference between the mean value of the two generating rates with the SII1 estimation) distribution and the baseline error (difference between SII 1 and SII 2) distribution for each parameter set (1664 in total). There is one-to-one correspondence between the name of each figure with the parameter sets in `trait_CES.rda`. # 6. trait_CES.rda Parameter combinations that are used in the paper, which is a dataframe with 1664 parameter sets. There are 14 variables: - `time` Numeric defining the length of the simulation in time units. In this paper, we use Myr (million years) as each unit. - `M` Numeric defining the size of mainland pool with state 1, i.e. the number of species with state 1 that can potentially colonize the island. - `M2` Numeric defining the size of mainland pool with state 2, i.e. the number of species with state 2 that can potentially colonize the island. - `lac` A numeric with the per capita cladogenesis rate with state 1. - `mu`A numeric with the per capita extinction rate with state 1. - `gam`A numeric with the per capita colonization rate with state 1. - `laa`A numeric with the per capita anagenesis rate with state 1. - `trans`A numeric with the per capita transition rate from state 1 to state 2. - `trans2`A numeric with the per capita transition rate from state 2 to state 1. - `K`Carrying capacity. - `lac2` A numeric with the per capita cladogenesis rate with state 2. - `mu2`A numeric with the per capita extinction rate with state 2. - `gam2`A numeric with the per capita colonization rate with state 2. - `laa2`A numeric with the per capita anagenesis rate with state 2. # References ## Data files referenced in this document Xie, S., Valente, L., & Etienne, R. S. (2022). Data generated for the publication of Xie,S., Valente,L., &Etienne,R.S. Zenodo.https://doi.org/10.5281/zenodo.7389581 ## Publication generating and using the data documented Xie, S., Valente, L., & Etienne, R. S. (2022). A simple island biodiversity model is robust to trait dependence in diversification and colonization rates _BiorXiv_. ## Associated software Lambert, Joshua W., Neves, Pedro, & Xie, Shu. DAISIErobustness: Test the Robustness of DAISIE to Geodynamics and Traits _https://github.com/rsetienne/DAISIE Etienne, R. S., Valente, L., Phillimore, A. B., Haegeman, B., Lambert, J. W., Neves, P., Xie, S., Bilderbeek, R. J. C., Hildenbrandt, H. (2022, April 21). DAISIE: Dynamical Assembly of Islands by Speciation, Immigration and Extinction _https://github.com/Neves-P/DAISIErobustness