Data from: Mean landscape-scale incidence of species in discrete habitats is patch size dependent
Data files
Feb 08, 2024 version files 21.85 MB
-
Datha.RData
232.10 KB
-
facVars_R01.RData
2.03 KB
-
mod.std_R01.RData
194.29 KB
-
moddat_R01.RData
187.21 KB
-
README.md
4.35 KB
-
topmod
21.23 MB
Abstract
Contains data and code for the manuscript 'Mean landscape-scale incidence of species in discrete habitats is patch size dependent'.
Raw data consist of 202 published datasets collated from primary and secondary (e.g., government technical reports) sources. These sources summarise metacommunity structure for different taxonomic groups (birds, invertebrates, non-avian vertebrates or plants) in different types of discrete metacommunities including 'true' islands (i.e., inland, continental or oceanic archipelagos), habitat islands (e.g., ponds, wetlands, sky islands) and fragments (e.g., forest/woodland or grass/shrubland habitat remnants).
The aim of the study was to test whether the size of a habitat patch influences the mean incidences of species within it, relative to the incidence of all species across the landscape. In other words, whether high-incidence (widespread) or low-incidence (narrow-range) species are found more often than expected in smaller or larger patches. To achieve this, a new standardized effect size metric was developed that quantifies the mean observed incidence of all species present in every patch (the geometric mean of the number of patches in which all species were observed) and compares this with an expectation based on re-sampling the incidences of all species in all patches. Meta-regression of the 202 datasets was used to test the relationship between this metric, the 'mean species landscape-scale incidences per patch' (MSLIP), and the size of habitat patches, and for differences in response among metacommunity types and taxonomic groups.
README: Data from 'Species representation in discrete habitats is patch size dependent'
Contains the raw data and code used to reproduce the analysis and results in the manuscript.
The simplest way to do this is to save all files provided to a single folder. Code needed to run the analyses in the paper are in scr_R_code_Dryad_R01.txt. Change the file extension from .txt to .R and then the script can be opened directly in R/R Studio and this includes code to load all objects described and run all analyses.
Description of the data and file structure
Description of data files:
Data:
Datha.RData - an R object of class 'list', each element of the list representing one of 202 p/a datasets obtained from published sources. Datasets are saved as sites x species dataframes, with patch area (in hectares) in the first column and species data in column numbers 2 to N+1, where N is the total number of species in that dataset (the +1 reflects the area data in the first column). Note, the number of rows (patches) and columns (area and species) differs for each element of the list (i.e., metacommunity dataset).
facVars_R01.RData - an R object of class 'data.frame' with three columns giving the factor variables: 'mcom', or metacommunity type (island, habitat island or fragment); 'taxa', the broad taxonomic group (birds, non-avian vertebrate, plants, invertebrates); and 'datQual', or data quality (4 ordinal levels coded 4 < 3 < 2 < 1, indicating a qualitative confidence level that the data for that patch represents a full census of all species present).
moddat_R01.RData - an R data.frame object containing the untransformed model predictors calculated using the R script (scr_R_code_Dryad_R01). Column names refer to predictors as follows:
- area (numeric) = patch area in hectares;
- SR (integer) = species richness of the patch (integer count of species);
- 'zmips' (numeric) = MSLIP standardized effect size;
- 'omips' (numeric) = observed MSLIP;
- 'res' (numeric) = residual deviation of patch richness from ISAR prediction;
- 'source' (factor) = code for distinct dataset; 'zsar' = slope (z parameter) of power law island species area relationship for that metacommunity;
- 'ln_area' (numeric) = natural log of patch area in hectares;
- 'ln_rich' (numeric) = natural log of SR;
- 'mcom' (factor) = metacommunity type (see Table 1, manuscript);
- 'taxa' (factor) = broad taxonomic group;
- 'datqual' (ordered factor) = ordinal factor indicating relative confidence (-4 < -3 < -2 < -1) that data represent a full census for all patches within that metacommunity (see Table S2, Appendix S1 in manuscript Supporting Information for detail on assignment criteria).
mod.std_R01.RData - an R data.frame with standardized data (mean = 0, SD = 1) for continuous predictors in moddat_R01.RData (see above). These are the data used to fit brmsfit object 'topmod'.
topmod - an R object of class 'brmsfit' containing the top-ranking model used to generate results in the manuscript.
Sharing/Access information
Data was derived from multiple published grey and primary literature sources (i.e., all data are in the public domain). Full citations for the original source publications are detailed in the Supporting Information for the manuscript (see Appendix S1). Keywords and other search techniques used to collate the database are detailed in the following sources:
- Deane and He (2018) Loss of only the smallest patches will reduce species diversity in most discrete habitat networks Global Change Biology 24:5802-5814
- Deane (2022) Species accumulation in small-large vs large-small order: more species but not all species? Oecologia 200:273-284
Code/Software
scr_R_code_Dryad_R01.txt - is the R code to run the analyses, organised in three sections as follows (click on 'Show document outline' for ease of navigation in RStudio):
- Pre-processing. Imports data and runs a for-loop to calculate the MSLIP standardized effect size. Can re-create the dataframe used in modelling.
- Modelling and diagnostics - gives code used to fit the final model, to check convergence and other diagnostics.
- Analyse model - code to extract draws from the model posterior and create figures in the text.
Uses R packages: brms, bayesplot, posterior, rstan, tidyverse, tidybayes.
Methods
Details regarding keyword and other search strategies used to collate the raw database from published sources were presented in Deane, D. C. & He, F. (2018) Loss of only the smallest patches will reduce species diversity in most discrete habitat networks. Glob Chang Biol, 24, 5802-5814 and in Deane, D.C. (2022) Species accumulation in small-large vs large-small order: more species but not all species? Oecologia, 200, 273-284.
Minimum data requirements were presence absence records for all species in all patches and area of each habitat patch. The database consists of 202 published datasets. The first column in each dataset is the area of the patch in question (in hectares), other columns record presence and absence of each species in each patch. In the study, a metric was calculated for every patch that quantifies how the incidence of species in each patch compares with an expectation derived from the occupancy of all species in all patches (called mean species landscape-scale incidences per patch or MSLIP). This value was regressed on patch size and other covariates to determine whether the representation of widespread (or narrowly distributed) species changes with patch size.
In summary, the work flow proceeded in three steps.
1. Pre-processing. This stage consisted of calculating a standardized effect size (SES) for the MSLIP metric for every patch and extracting important covariates (taxon, patch type, total number of patches, total number of species, patch-level deviations from fitted island species area relationships, data quality) to be used in model building.
2. Model building. MSLIP SES was then modelled against patch area and other covariates using a multilevel Bayesian (meta-)regression model using Stan and brms in the statistical programming langauge R (Version 4.3.0).
3. Model analysis. The final model was analysed by running different scenarios and the patterns interpreted in light of the hypotheses under test and creating figures to illustrate these.
Usage notes
All provided files are intended for use within the R-programming environment. The raw database records required to run the analysis from scratch, along with processed data used to run regression models are saved as R data objects (i.e., extension '.RData'). The fitted model obtained in analysis and used to generate results is also an R object, but of class 'brmsfit' (requiring R package brms is loaded into the R-workspace). Both object types can be opened in R (R Studio, etc).