Data and code from: Effects of phylogeny on coexistence in model communities
Data files
Sep 12, 2024 version files 606.98 KB
-
Data-Code.zip
597.31 KB
-
README.md
9.67 KB
Sep 16, 2024 version files 606.99 KB
-
Data-Code.zip
597.31 KB
-
README.md
9.68 KB
Abstract
This repository contains code and data needed for the analyses presented in the paper 'Effects of phylogeny on coexistence in model communities'. In that contribution, by coupling a simple model of trait evolution on a phylogenetic tree with Lotka-Volterra community dynamics, allows us to derive properties of a community of coexisting species as a function of the number of traits, tree topology and the size of the species pool. Our analysis highlights how phylogenies, through traits, affect the coexistence of a set of species. Together, these results provide much-needed baseline expectations for the ways in which evolutionary history, summarized by phylogeny, is reflected in the size and structure of ecological communities. Here we provide de code needed to compute analytic expectations, to simulate community dynamics, as well as the dataset (Biodiversity II experiments, Tilman et al., Science 2001) used to test model predictions.
README: Data and code from: Effects of phylogeny on coexistence in model communities
https://doi.org/10.5061/dryad.cvdncjtck
Description of the data and file structure
This repository contains code and data needed for the analyses presented in the paper 'Effects of phylogeny on coexistence in model communities'.
Files and variables
File: Data-Code.zip
Description: The compressed file contains several files and folders described below.
The folder Data contains the datasets: a sub-folder BiodivII contains Biodiversity II data, and the file Senna.MCC.nex is the Senna tree in nexus format.
The folder Data/BiodivII contains:
- Raw data in the sub-folder Data/BiodivII/RawData, listing (for each year the experiment was conducted) the associated plots, together with the species present in them as well as their biomass at the end of the experiment. We refer the reader to the original reference (Tilman et al., Science 2001) for a comprehensive list of variables used.
- The 9 datasets (for some months of the years 2001, 2006, 2007, 2008, 2011, 2012, 2014, 2015 and 2017) filtered by Lemos-Costa et al. in the folder Data/BiodivII/Lemos-Costa (files are named as bioII_year-month.csv in this folder). Each file lists biomasses for all the species considered after filtering, for all the plots that met the requirements (see Lemos-Costa et al., 2023), in units of g/m^2.
- After removing (from the bioII_year-month.csv files) the plots that were not planted with all the 16 grassland species considered in the experiment, we obtained the files bioII_year-month-filtered.csv that can be found in the folder Data/BiodivII. This last filtering procedure was done using the R script filter-biodiversity-datasets.R, which can be found in the repository and is described below.
- The phylogenies estimated by molecular methods, using the R package V.PhyloMaker, and obtained by Lemos-Costa et al., are taken directly from that reference (files named bioII_year-month_tree.RData). Each RData file contains a variable named tree, which can be transformed into a covariance matrix using the command vcv of the R package ape.
Code/software
File: Data-Code.zip
Description: The compressed file contains several R scripts described below.
The code used to analyze data, simulate the model and compute analytic expectations was programmed using R. Any reference to paper sections, equations or figures can be traced back in the paper's text (see preprint https://arxiv.org/abs/2310.14392).
The R working directory has to be set up as the main directory resulting after unzipping Data-Code.zip. After unzipping, all the R scripts will be located at the working directory. With these R scripts, the user can analyze biodiversity data, compute model simulations and analytic expectations described in the accompanying paper.
The code is based on several standard R libraries (for each one of our R scripts, dependencies with external R libraries are detailed; see below). All these libraries can be installed using install.packages(). The latest version of each library will make the code run properly.
The code is structured as follows:
- Script files containing basic code:
- build_functions.R. Functions to set up the matrix format used by the Lemke-Howson algorithm used to compute equilibrium abundances, for a general correlation tree structure and for particular structures (hierarchical and perfectly balanced trees).
- lemke_howson.R. Functions to implement Lemke-Howson algorithm to compute equilibrium abundances.
- Script files containing specific code for model simulation and analytic expectations:
- abundance_distribution_trees.R. Functions to evaluate specific tree topologies used as examples in the paper (perfectly hierarchical / perfectly balanced trees). Depends on R libraries tidyverse, ggtree and ape.
- approximation_relative_abundances.R. Functions to compute analytic approximations to relative abundances using saddle-point calculations (Supplementary Information section S5).
- approximation_total_biomass.R. Functions to compute analytic approximations for total biomass conditional to a fixed number of survivors in the community (Supplementary Information section S4). Depends on the R script integrate_invasibility.R, described below.
- attractors_size.R. Functions to compute the mean community size and standard deviation at the endpoint. Community sizes are obtained by dampling Wishart covariance matrices. Depends on R library rWishart, as well as on R scripts build_functions.R and lemke_howson.R, described above.
- compute_distribution_subtrees_3.R. Functions to compute the probabilities of sub-communities of perfectly hierarchical tree of n=3 species (used to generate main text Figure 6). Depends on the R script integrate_feasibility.R, described below, as well as on the R function integrate to compute numerical integrals.
- compute_saddle_point_exact.R. Computes the expected fraction of survivors in the community by solving numerically Eq. S107 (Supplementary Information). Provide code used in Supplementary Information section S7 (Figure S6) and section S8 (to estimate the average correlation rho that best fits a given average community size or average diversity, Figures S8 and S10). Uses the stats package function uniroot.
- deterministic_varying_rho.R. Functions to estimate the average fraction of survivors for varying growth rates in the deterministic limit by solving numerically Eq. S139 (see Supplementary Information section S7, Figure S7). Uses the stats package function uniroot.
- integrate_feasibility.R. Functions to compute, by numerical integration, the probability of feasibility for star tree structures (Supplementary Information section S3). Depends on R libraries tidyverse and pracma, and uses the R function integrate to compute numerical integrals.
- integrate_invasibility.R. Functions to compute, by numerical integration, the probability of non-invasibility for star tree structures (Supplementary Information section S3). Depends on R library tidyverse, and uses the R function integrate to compute numerical integrals. Allows also to compute numerically the probability density function of total biomass per number of traits (Supplementary Information section S4, Eq. S112).
- integrate_number_survivors.R. Computes the distribution of the number of survivors for star tree topologies by integrating numerically feasibility and non-invasibility probabilities (Supplementary Information section S3, Eq. S75). Depends on integrate_feasibility.R and integrate_invasibility.R described above.
- survival_probability.R. Function that calculates, by averaging over Wishart matrix realizations, the probability that a given species in the community survives at equilibrium. Used in Figure S7 of the main text and Supplementary Information figure S13. Depends on the rWishart R library.
- Script file to pre-process data:
- filter_biodiversity_datasets.R. R script designed to remove, from each bioII_year-month.csv biomass file, the plots that were not planted with all the 16 grassland species considered in the experiment.
- Script files to generate all the figures in the paper. The R code needed to generate main-text figures 3-7, and supplementary-information figures S2-S13 is provided in scripts named figure_X_main_text.R for main text figures and figure_SX_supplement.R for supplementary information ones, X indicating the figure number.
In particular, the code developed to analyze biodiversity data is given in files figure_7_main_text.R, figure_S8_supplement.R, figure_S9_supplement.R and figure_S10_supplement.R.
Some specifications used in ggplot figures are included in mytheme_ggplot.R. For plotting figures, besides ggplot, we also used the R libraries grid, gridExtra, latex2exp and cowplot.
In order to showcase how the code works, we provide a sample code making use of some of the implemented functions. We describe the sample code in detail here (it is contained in the file named sample_code.R).
- First a star tree topology with a pool size of n=20 species and l=30 traits is simulated. Matrix Sigma stands for the tree correlation matrix, and vector r represents the vector of growth rates.
- The function simulate_community draws a sample covariance matrix from the Wishart ensemble, and the Lemke-Howson algorithm is applied to obtain equilibrium abundances, as well as the number of species that survive for each sampled covariance matrix.
- Then 500 model replicates are generated, and the mean fraction of survivors and mean total biomass are computed from realizations averages (using tidyverse).
- Finally, these results are compared with analytical approximations. By solving the non-linear equation defined by function saddle_point_function (from script compute_saddle_point_exact.R), we obtain an expectation for the average fraction of survivors. Moreover, expected total biomass is computed analytically with function compute_saddle_cond_biomass (from script approximation_total_biomass.R).
Access information
Data was derived from the following sources:
- https://github.com/StefanoAllesina/lemos-costa-2023 (code and data accompanying Lemos-Costa et al, 2023, see https://www.biorxiv.org/content/10.1101/2023.09.04.556236v2.abstract).
Methods
Raw data was taken from Biodiversity II experiments (Tilman et al., Science 2001). That biodversity dataset was filtered by Lemos-Costa et al. (https://www.biorxiv.org/content/10.1101/2023.09.04.556236v2.abstract, 2023). Only datasets satisfying three conditions were taken into account by Lemos-Costa et al.: a) only species that appear in multiple plots are considered to be part of the pool; b) unidentified biomass or non-target species do not dominate the plots; c) there is sufficient replication or diversity of assemblages to confidently fit the model. In the raw dataset, aside from the original target pool of eighteen species, many other species are found in the plots, along with a substantial amount of litter or unidentified biomass. Therefore, a larger number of species was included in the “species pool” , and also plots in which the total biomass for species not in the pool surpassed 10% of the total biomass were excluded. In this way, Lemos-Costa et al. extracted nine data sets (for distinct years) containing between 13 and 20 species and a substantial number of assemblages. These datasets were downloaded from https://github.com/StefanoAllesina/lemos-costa-2023, together with the phylogenetic trees computed using the V.PhyloMaker R package. They contain assemblages in which different combinations of the 16 species were planted in plots.
We took those datasets and, by comparing them with raw data, we extracted only the plots in which all the 16 species in the experiment were planted. This way we obtained diversity data to fit the model to diversity data.