Code and data from: Measuring the overall functional diversity by aggregating its multiple facets: functional richness, biomass evenness, trait evenness, and dispersion
Data files
Oct 04, 2024 version files 2.69 MB
-
generated_and_caseStudy_data.zip
2.68 MB
-
README.md
17.42 KB
Abstract
Human activities induce environmental changes, which can affect individuals' traits and then lead to changes in functional diversity and finally in ecosystem functioning. Measuring functional diversity is thus of utmost importance to understand the consequences of such activities on ecosystem functioning. Functional diversity is composed of several facets, but these facets are almost always measured individually and we lack a metric capturing the overall, multifaceted functional diversity. We consequently developed an index K of the overall functional diversity defined as the geometric mean of four independent facets: functional richness (the classic measure of the coverage over the trait axis), biomass evenness, and trait evenness (quantifying how evenly filled the biomass and trait distributions, separately) and dispersion (quantifying the spread around the biomass-weighted mean trait, which is maximised for uniform and bimodal distributions). K and each of its underlying facets take values between 0 and 1 and assume the uniform distribution to yield maximal diversity. We compared K to other, more classic metrics measuring a single facet of functional diversity by calculating all these indices for randomly and non-randomly generated communities. We showed that K overcomes several limitations of other indices (e.g. lack of accuracy, not computable for simple communities, unclear ecological interpretation), and was well correlated with ecosystem functions in simulated predator-prey communities. In addition, decomposing K into its underlying facets revealed that ecosystem functions can be driven by different facets of K on different trophic levels. The strength of our index K lies in being the only index that measures the overall functional diversity by combining several facets and providing the option to decompose K into them. This notably yields mechanistic insights about which facets are more important for driving changes in functional diversity and ecosystem functioning.
https://doi.org/10.5061/dryad.tmpg4f55r
This document explains the structure of the provided files and the methods presented in the associated article by Wojcik et al (2024), "Measuring the overall functional diversity by aggregating its multiple facets: functional richness, biomass evenness, trait evenness, and dispersion", Methods in Ecology and Evolution. First, we describe the files uploaded in the folder called generated_and_caseStudy_data, which are presented datasets in .csv and .xlsx formats, and second, we describe the files, software, methods, and data in the folder scripts_Kindex used for the published study. In both folders, empty cells or NA values stand for "not available", since computations could not be done due to methodological details in the publication. All the datasets are the property of at least one co-author of this study.
Folder generated_and_caseStudy_data
The aim of this folder is only to display data in case the reader does not use Python and/or R. There are two folders: Generated_data and Lake_Constance.
Subfolder Generated_data
Files in the folder Generated_data* *were not used in this format in the publication but were converted from .npy files to enable every user access. It contains two datasets that were generated in Python:
random_communities_indices_functional_diversity.csv presents the data of 9000 randomly generated communities for which several indices of diversity were calculated (see methodological details below); it contains the number of species present in the community ("n" between 2 and 10), functional evenness ("FEve"), functional divergence ("FDiv"), functional dispersion ("FDis"), Rao's quadratic entropy ("Rao"), functional extension and evenness ("FEE"), the overall functional diversity i.e. the Kindex ("K") followed by its four underlying indices functional richness ("FRic"), biomass evenness ("BE"), trait evenness ("TE") and dispersion ("Dis"). All metrics are unitless.
nonrandom_communities_over_1species_perTL_EFs_and_functional_diversity_nduplicates_200.csv contains the indices of functional diversity and the ecosystem functions of predator-prey food web models. Data related to prey are marked by a "P" and data related to predators are marked by a "H". The first columns are the number of prey species ("nP"), the number of predator species ("nH"), adaptation within prey species ("wP"; boolean 0: disabled or 1: enabled), adaptation within predator species ("wH"; boolean 0: disabled or 1: enabled), IDsim the identification key of the simulation. Then we provide the grand mean and the coefficient of variation (CV) of ecosystem functions and properties over 200 simulations for each model parametrisation. To avoid repetitions, we only provide their description and name, but keep in mind that for each of these quantities, a first column provides the "mean" and a consecutive second column the "CV". These quantities are by column order: number of persisting prey ("realised_nP"), total prey biomass ("total_biomass_P", in mg C m-3), prey synchrony ("synchrony_P", unitless), prey production ("production_P", in mg C m-3), biomass-weighted mean trait of prey ("trait_P", unitless), overall functional diversity K of prey ("K_P") followed by its four underlying indices functional richness ("FRic_P"), biomass evenness ("BE_P"), trait evenness ("TE_P") and dispersion ("Dis_P"), functional evenness of prey ("FEve_P"), functional divergence of prey ("FDiv_P"), functional dispersion of prey ("FDis_P"), Rao's quadratic entropy of prey ("Rao_P"), functional extension and evenness of prey ("FEE_P") and its modified expression ("FEEc_P"), number of persisting predator ("realised_nH"), total predator biomass ("total_biomass_H", in mg C m-3), predator synchrony ("synchrony_H"), predator production ("production_H", in mg C m-3), biomass-weighted mean trait of predator ("trait_H", unitless), the overall functional diversity K of predator("K_H") followed by its four underlying indices functional richness ("FRic_H"), biomass evenness ("BE_H"), trait evenness ("TE_H") and dispersion ("Dis_H"), functional evenness of predator ("FEve_H"), functional divergence of predator ("FDiv_H"), functional dispersion of predator ("FDis_H"), Rao's quadratic entropy of predator ("Rao_H"), functional extension and evenness of predator ("FEE_H") and its modified expression ("FEEc_H"), the relative top-down control of predators on prey ("TD_BU_ratio") and the preference of the least selective predator for the least edible prey ("p11").
Subfolder Lake_Constance
The two datasets used in the case study of Lake Constance presented in the Appendix of the publication are part of the LakeBase database (https://fred.igb-berlin.de/Lakebase).
Traits_data.csv contains the morphotype number, the species name, and 5 phytoplankton traits: cell volume in µm^3, longest linear dimension (LLD) in µm, the maximum growth rate in day-1, phosphate affinity in L.µmol.day-1 in and defence (1-susceptibility to predation, unitless).
Biomass_timeseries.xlsx are biomass time series for different phytoplankton morphotypes and for the period 1979-1998, which includes the date, morphotype number, and biovolume in cm^3.m-2.
Application of the K index to the data is done using R version 4.1.2 in R studio (Kindex_Lake_Constance.R) and the following libraries readxl, tidyverse, dplyr, PerformanceAnalytics, and gridExtra.
Folder scripts_Kindex
In the main text of our study, we generated datasets and briefly summarised the methods used, which are extensively explained in the associated article published.
The first dataset corresponded to five test series. In each test series, there were three communities composed of at most seven species, for which a relative biomass and a trait value were provided (data between 0 and 1, unitless). There was very little data required, therefore we made the choice not to create an additional file containing the data but directly incorporated these data in dictionaries defined in the python script test_series_and_randomly_generated_communities.py.
The second dataset corresponded to 9000 randomly generated communities (1000 communities per value of species richness varying from two to ten), where biomass and trait values were attributed to each species by drawing biomass values from a lognormal distribution with a mean of 0 and a standard deviation of 1, and trait values from a uniform distribution within [0,1]. Then we computed the different indices of functional diversity of each community. It was generated with the python script test_series_and_randomly_generated_communities.py, and the computed indices of functional diversity are available in the .csv file random_communities_indices_functional_diversity.csv
The third dataset was created by using the biomass and trait time series of predator-prey systems, generated in a previous study thanks to an extended Rosenzweig-MacArthur predator-prey model. Although these data were already available in another Dryad depository, we provided them again here, along with the additional file containing our calculations of the mean and coefficient of variation (CV) of indices of functional diversity and ecosystem functions over time of each time series. It was generated with the Python script nonrandomly_generated_communities.py and data are available in the .csv file nonrandom_communities_over_1species_perTL_EFs_and_functional_diversity_nduplicates_200.csv
Tasks performed by the Python and C scripts
Generating time series and computing ecosystem functions and properties (third dataset)
The observations below are extracted from the README file of the Dryad depository where the scripts and data were originally uploaded. The information is reproduced in the present README file to ease the reading and understanding.
To generate biomass and trait time series of a particular food web, we executed the file get_timeseries_ecosystem_functions.py. This Python script performed several tasks: (a) setting the parametrisation of the model, (b) calling and executing the file get_timeseries_ecosystem_functions (produced by compiling get_timeseries_ecosystem_functions.c; cf details later), which solved the model's equations and generated the biomass and trait time series by using the parametrisation defined in python.
For each parametrisation, a certain number of randomly-initialised simulations were run (n_duplicates) and the previous tasks (a and b) were performed for each simulation. To distinguish between different simulations of different parametrisations, we then used an ID number composed of 7 digits: the first digit refers to the number of prey (between 1 and 5), the second digit refers to the number of predators (between 1 and 5), the third digit indicates whether adaptation within prey functional groups is enabled (1) or disabled (0), the fourth digit indicates whether adaptation within prey functional groups is enabled (1) or disabled (0), the last four digits correspond to the number of the simulation of a given parametrisation (if n_duplicates=200, this number varies between 1 and 200). This ID number was used to name the time series files "timeseries_{IDsim}.txt", where IDsim is the ID number of the simulation. Thus, "timeseries_5310054.txt" was the biomass and trait time series of simulation 54 corresponding to a food web with 5 prey functional groups and 3 predator functional groups where adaptation within prey groups was enabled and adaptation within predator groups was disabled."
Note that we used a similar code as the original Python script to compute the temporal means and the coefficient of variation of the ecosystem functions and properties for each time series, and thus more data is available than used in the analyses of the results (cf. nonrandomly_generated_communities.py).
Analysing results and producing figures
We analysed the results and produced figures using two Python scripts. test_series_and_randomly_generated_communities.py (for the first and second datasets) and nonrandomly_generated_communities.py (for the third dataset).
Folder and file structure
In the main folder, you can find all the Python and C scripts and two subfolders for data and figures. As indicated by their name, the data folder contains the datasets used in the analyses, and the figures folder the figures produced during the analyses. The folder and file structure are largely common to the Dryad depository where the scripts and data, that were used for the third dataset, were originally uploaded. The information is reproduced in the present README file to ease the reading and understanding.
Folders structure
The subfolder data is divided into sub subfolders named "vP{x}vH{y}", where {x} and {y} refer to the values of vP and vH multiplied by 100. vP and vH are the parameters determining the speed of trait of adaptation of prey and predators, respectively (cf. detailed information in the associated article). Typically, we simulated four scenarios where vP=0.06 and vH=0.02 ("vP6_vH2"), vP=0.6 and vH=0.06 ("vP60_vH6"), vP=0.6 and vH=0.2 ("vP60_vH20") or vP=0.6 and vH=0.6 ("vP60_vH60"). In each subfolder, there are the time series files of each simulation and parametrisation ( "timeseries{IDsim}.txt"), and the .npy files gathering the temporal means and the coefficient of variation of the ecosystem functions and properties of each associated simulation for a given parametrisation ("outputs_nduplicates{x}_ {IDparam}.npy").
Note that in the present project, we used the data of only one scenario vP=0.06 and vH=0.02, and thus the time series used for the third dataset are available in the subsubfolder "vP6_vH2". This folder also contains the file EFs_and_functional_diversity_nduplicates_200.npy (see structure below), where the indices of functional diversity and the ecosystem functions computed for each time series in the Python script nonrandomly_generated_communities.py.
Files structure
The second dataset, composed of 9000 communities, was saved in a .npy file under the name random_communities_indices_functional_diversity.npy. It contains by order of column: the ID of the randomly generated community (composed of at least 5 digits, with the beginning of the ID indicating the number of species in the community and the last four digits the number of the simulation, e.g., "20005" is the 5-th simulated community of two species and "101000" the 1000-th simulated community of ten species), twelve columns with different values of diversity indices (the number of functional species n (e.g., two species with very similar trait values are considered as one functional species), functional evenness FEve, functional divergence FDiv, functional dispersion FDis, Rao's quadratic entropy Rao, functional extension and evenness FEE and its modified version FEEc, our index K of overall functional diversity and its four underlying indices: functional richness FRic, biomass evenness BE, trait evenness TE, and dispersion Dis), a list with the biomass values of species and a list with the trait values of species.
For the time series used in the third dataset, the file structure was explained as follows in the README file of the original Dryad depository:
The number of columns in the data files generated with the Python and C scripts depends on the number of prey functional groups (nP) and the number of predator functional groups (nH).
The time series files ( "timeseries_{IDsim}.txt") present by column order: the timestep, solver crash (0 if no crash or -1 otherwise), nP, nH, adaptation within prey functional groups (boolean 0: disabled and 1: enabled; named wP), adaptation within predator functional groups (boolean 0: disabled and 1: enabled; named wH), IDsim, the biomasses of each nP and nH groups, (across nP+nH columns), the trait values of each nP and nH groups (across nP+nH columns).
In addition, there is the file EFs_and_functional_diversity_nduplicates_200.npy, which contains the indices of functional diversity and the ecosystem functions of each time series file. This .npy file presents in its five first columns: nP, nH, wP, wH, IDsim (cf. above for the exact meaning). Then, the mean and the CV of different levels of diversity of ecosystem functions and properties are provided. To avoid repetition, we only provide their description and name, but keep in mind that for each of these quantities, a first column provides the "mean" and a consecutive second column the "CV". These quantities are by column order: the number of persisting prey ("realised_nP"), the total prey biomass ("total_biomass_P"), prey synchrony ("synchrony_P"), prey production ("production_P"), the biomass-weighted mean trait of prey ("trait_P"), the overall functional diversity K of prey ("K_P") followed by its four underlying indices functional richness ("FRic_P"), biomass evenness ("BE_P"), trait evenness ("TE_P") and dispersion ("Dis_P"), functional evenness of prey ("FEve_P"), functional divergence of prey ("FDiv_P"), functional dispersion of prey ("FDis_P"), Rao's quadratic entropy of prey ("Rao_P"), functional extension and evenness of prey ("FEE_P") and its modified expression ("FEEc_P"), the number of persisting predator ("realised_nH"), the total predator biomass ("total_biomass_H"), predator synchrony ("synchrony_H"), predator production ("production_H"), the biomass-weighted mean trait of predator ("trait_H"), the overall functional diversity K of predator("K_H") followed by its four underlying indices functional richness ("FRic_H"), biomass evenness ("BE_H"), trait evenness ("TE_H") and dispersion ("Dis_H"), functional evenness of predator ("FEve_H"), functional divergence of predator ("FDiv_H"), functional dispersion of predator ("FDis_H"), Rao's quadratic entropy of predator ("Rao_H"), functional extension and evenness of predator ("FEE_H") and its modified expression ("FEEc_H"), the relative top-down control of predators on prey ("TD_BU_ratio") and the preference of the least selective predator for the least edible prey ("p11").
Note that we used a similar script uploaded in the original Dryad depository file for calculating multiple ecosystem functions and properties of prey and predators, but many values of certain columns were not used in the present analyses.
Software versions and packages
If the users want to generate again the time series used for building the third dataset, we provide the details presented in the README file of the original Dryad depository:
Before executing any C or Python script, we first compiled the C script get_timeseries_ecosystem_functions.c by using the following command in the terminal: gcc -o get_timeseries_ecosystem_functions get_timeseries_ecosystem_functions.c -lm -lgsl -lgslcblas -lsundials_cvode -lsundials_nvecserial
. We notably used the SUNDIALS CVODE solver 5.7.0 to solve numerically the system of ordinary differential equations in C. We also used several packages in Python 3.10 among which NumPy, Pandas, and Matplotlib.
In the other Python scripts, we used two additional packages Random and SciPy in Python 3.10.
In the manuscript related to these scripts and data, we aimed to present a new index to measure the overall functional diversity (that we call K). Shortly, K is the geometric mean of four independent facets: functional richness (the classic measure of the coverage over the trait axis; noted FRic), biomass evenness, and trait evenness (quantifying how evenly filled the biomass and trait distributions, separately; noted BE and TE respectively) and dispersion (quantifying the spread around the biomass-weighted mean trait, which is maximised for uniform and bimodal distributions; noted Dis). K and each of its underlying facets take values between 0 and 1 and assume the uniform distribution to yield maximal diversity. We used other indices published in the literature to compare our new index K: functional evenness (FEve), functional dispersion (FDis), functional divergence (FDiv), Rao's quadratic entropy (Rao), Functional extension and evenness (FEE) and its modified version (FEEc). For a complete description and mathematical equations, see the manuscript.
We used three types of datasets, which all provided information about communities composed of a given number of species, each characterised by biomass and trait values:
- defined communities corresponding to a few established test series in the literature, which enabled us to check that our newly-developed index K behaved as expected.
- randomly generated communities, i.e., the biomass and trait values of each species part of a community were drawn independently from a lognormal distribution with a mean of 0 and a standard deviation of 1, and from a uniform distribution within [0,1], respectively. This dataset was used to check the intrinsic dependencies of indices of functional diversity with the number of species present in a community but also proved that the four underlying facets of K were independent.
- non-randomly generated communities, i.e., we used the equations of a modified Rosenzweig-MacArthur, predator-prey model to generate communities with two trophic levels (prey and predator), where the number of species can be varied and the adaptation within species can be enabled or disabled for each trophic level, independently. In addition to getting biomass and trait values of each species present in the system, four ecosystem functions were computed at each trophic level: prey and predator total biomass, and prey and predator production. This dataset was used to explore the dependencies of indices of functional diversity among each other and with species richness under non-random processes and to estimate the predictive power of these indices for ecosystem functioning.
The data of the non-random communities were already published in another data depository, where the details of calculations and data structure used in this present study and the original study cf. scripts and data in related work). In this present study we additionally calculated several indices of functional diversity (see some detail below and detail in the manuscript) and computed the Spearman's rank correlation coefficients (rs) among these indices per trophic level, and between these indices of the two trophic levels and four ecosystem functions.
In addition, we presented a case study in the Appendix to illustrate how our index can be applied to phytoplankton communities of Lake Constance. We notably compiled biovolume and 5 traits (cell volume, longest linear dimension, maximum growth rate, phosphate affinity, and defence) time series.