An ecological definition of small fragments
Data files
Dec 02, 2025 version files 20.30 MB
-
dat_table_1_regression.csv
18.10 KB
-
Data_on_source_publications.rtf
1.53 MB
-
frag_mcom_lst.RData
167.87 KB
-
mdat138.RData
3.28 KB
-
moddat.csv
270.62 KB
-
ofd.lst.RData
72.24 KB
-
out_lst.RData
25.57 KB
-
plotting_df.RData
3.14 MB
-
README.md
9.91 KB
-
scr_R_code_Archive.R
14.56 KB
-
topmodall
15.05 MB
Abstract
In an increasingly fragmented natural world, understanding how different ecological phenomena vary with patch size has many motivations. Examples include the assembly of biodiversity, ecosystem service provision and the suitability of fragments for habitat specialist species. A common approach to such questions divides fragments into small and large size classes for separate analysis. However, lack of an objective definition and means to differentiate ‘small’ from ‘large’ patches limits our ability to compare findings across studies, arguably impeding progress toward any unified views. Because larger and smaller fragments tend, on average, to respectively over-represent narrow- and wide-range species, an ‘area for unbiased species representation’ (AUSR) can be defined at some intermediate fragment size predicted to contain species at incidence frequencies approximating that of the overall landscape. A central tendency for AUSR has previously been estimated for patchy habitat types (islands, habitat islands and fragments), providing a benchmark to compare this threshold of small fragment size between studies. However, if AUSR can be readily determined within individual study systems, it would also provide an objective threshold to separate small and large fragments under the AUSR definition. Here we assess this potential for 138 published datasets from various fragmented landscapes using an index comparing species incidence frequencies in each fragment with that of the overall landscape. Regressing this index on fragment area yielded an estimate for AUSR in over 90% of cases, suggesting broad applicability as an objective way to separate fragments into two size classes. Regression slopes provide further information on the relative representation of narrow- vs wide-range species, with 80% being numerically consistent with the overall negative trend. Requiring only the same data as the island species-area relationship, AUSR can provide useful insights on the relative importance of narrow- vs wide-ranging species for studies of patch-size dependence in ecological phenomena.
https://doi.org/10.5061/dryad.0rxwdbs89
The raw database comprises 138 discrete habitat patch study systems comprising habitat fragments of varying size, each with either one or more pooled samples (e.g., quadrats) or species lists. Represented in a fragment-by-species presence-absence data matrix, columns record the incidence of species within each fragment.
Objects in this repo were derived from the raw data, consisting of:
- two .csv files
- five R objects in .RData format.
- an R script (.R format)
- an R model output object of class 'brmsfit'
- information on source data and origin (rich text format file)
The easiest way to access the results is to save all files into a folder and double click the R script to open in the R environment. The script will import the data.
Description of the data and file structure
.csv files
dat_table_1_regression.csv: a dataframe used in modelling. File contains the values for the following predictors, covariates and intermediate values in their calculation described in the R script.
-
source = data source identifier.
-
slife = simplified life form with 4 levels: Brd = birds, Ver = non-avian vertebrates, Inv = invertebrate, Plt = plants.
-
area_cof = mean of the modelled coefficient for area in the individual regression models.
-
rexp_area = rate parameter of the exponential distribution fitted to the fragment size distribution (an indicator of size disparity in the distribution).
-
sdlogn_area = standard deviation of the fragment size distribution.
-
avelogn_area = mean of the fragment size distribution.
-
zsar = slope (exponent) of the power law island species-area relationship for that data set.
-
rsoc_cof = value of the power law parameter from fitting ranked species occupancy curves (used to infer bimodality in the species occupancy distribution).
-
rsoc_exp = value of the exponential cutoff parameter from fitting ranked species occupancy curves (RSOC - used to infer bimodality in the species occupancy distribution).
-
cor_res_area = capture variations in the correlation between richness and area. The z-value of the ISAR for effort-controlled studies and the Pearson correlation between res and fragment area for balanced data (see 'bal').
-
conf = data quality, ordinal factor with four levels: L1, L2, L3, L4, where the numerical value indicates the confidence that the data comprise a complete census of the species present (L1 = highest confidence, L4 = lowest confidence).
-
bimodal = outcome of the RSOC test of bimodality of the occupancy frequency distribution (1 = bimodal, 0 = not bimodal).
-
mcom = metacommunity type with 3 levels: Ff = forest/woodland fragments, Fv = other vegetatively defined fragment (grass/shrubland), Af = archipelago fragment created by reservoir flooding.
-
db = source database with 2 levels: dd = Deane et al (2024); fs = fragSAD
-
bal = binary indicator of a balanced survey design. Values of 1 indicate the study used a balanced study design (same effort in each fragment); values of 0 indicate effort-controlled studies varying with the size of the fragment.
moddat.csv: Data used in modelling saved as a .csv file. The file contains the following columns.
- area = patch area in hectares
- SR = species richness (integer number of species recorded in that patch)
- zmips = mean species incidences per patch (MSLIP) standardized effect size metric derived in Deane et al (2024)
- omips = observed MSLIP metric (not used)
- res = residual deviation from expected species richness for each patch (obs - exp)/exp
- source = data source
- zsar = slope (exponent) of the power law island species-area relationship for each dataset
- db = source database with 2 levels: dd = Deane et al (2024); fs = fragSAD
- ln_area = natural log of area
- slife = simplified life form with 4 levels: Brd = birds, Ver = non-avian vertebrates, Inv = invertebrate, Plt = plants
- altlife = alternative simplified life form (Not used) with 5 levels: the four in slife + Hrp = herpetofauna
- mcom = metacommunity type with 3 levels: Ff = forest/woodland fragments, Fv = other vegetatively defined fragment (grass/shrubland), Af = archipelago fragment created by reservoir flooding.
- conf = data quality, ordinal factor with four levels: L1, L2, L3, L4, where the numerical value indicates the confidence that the data comprise a complete census of the species present (L1 = highest confidence, L4 = lowest confidence).
RData objects
ofd.lst.RData: an R list object containing the occupancy frequency distribution for each of the 138 datasets in the database. Each element in the list is a vector with the species name (or code) and the proportion of sites in which it was present, varying on the interval (0,1).
out_lst.RData: an R list object containing the model outputs for each of the 138 individual regression models fit to the datasets. Each element in the list gives a 8 column table with the following headings:
- Estimate = estimated coefficient;
- Est.Error = estimated error in the coefficient;
- l-95% CI = lower 95% credible interval for the coefficient posterior;
- u-95% CI = upper 95% credible interval for the coefficient posterior;
- Rhat = Gelman-Rubin diagnostic;
- Bulk_ESS = effective number of samples in the bulk of the posterior;
- Tail_ESS = effective number of samples in the tails of the posterior;
plotting_df.RData: an R dataframe with predicted response from each of the models for plotting purposes. Four columns:
- ln_area = the simulated value for log fragment area used in the prediction;
- pred = median posterior predicted output;
- mod = modelled smooth curve for plotting (pred ~ ln_area);
- source = origin dataset.
mdat138.RData: an R dataframe containing metadata on the 138 datasets used in analysis. Essentially this is a subset of the information in Data_on_source_publications.docx , but is included as an R object for analysis. The dataframe has eight columns.
- filename = unique identifier for dataset;
- taxa = broad taxonomic group used in modelling (Brd = birds, Ver = non-avian vertebrates, Inv = invertebrates; Plt = plant);
- mcom = broad metacommunity type (Af = reservoir island fragment; Ff = forest or woodland fragment, Fv = other vegetative fragment such as grass or shrubland);
- country = location of study
- source = primary or grey literature from which data were obtained (see Data source publications)
- conf = Integer score designating data quality in confidence of representing a full census for each fragment (1 = atlas or field confirmed atlas; 2 = multiple survey methods or collation of multiple field visits; 3 = single field survey effort adjusted 4. single survey no effort adjustment or validation or multiple surveys without effort adjustment to patch size [or precise methods unknown]);
- altTax = an alternative taxonomic grouping to 'taxa', with an additional level for herpetofauna ("Inv" = invertebrates, "Hrp" = herpetofauna, "Brd" = birds, "Ver" = other vertebrates, "Plt" = plants);
- db = source database from which data were obtained (dd = Deane et al (2024), fs = fragSAD).
frag_mcom_lst.RData: an R list object, which is a composite record of each of the 138 fragmented metacommunities analysed in fragments by descriptors format. Each element of the list is a dataframe, each line is a fragment. The fragment area is in the first column, and the presence/absence of each species appears in columns 2 to the total number of columns.
Rich text format
Data_on_source_publications.rtf: contains a summary table, giving the provenance and information on each of the datasets and lists the individual primary sources from which the data were obtained. Table contains the following fields:
- Filename = unique identifier for dataset;
- summary = brief description of study system;
- taxo_group = broad taxonomic group used in modelling (Brd = birds, Ver = non-avian vertebrates, Inv = invertebrates; Plt = plant);
- metacom = broad metacommunity type (Af = reservoir island fragment; Ff = forest or woodland fragment, Fv = other vegetative fragment such as grass or shrubland);
- nfrag = number of fragments,
- nspp = total number of species;
- Country = location of study;
- minArea = smallest habitat patch area;
- maxArea = largest habitat patch area;
- areaUnit = unit of minArea and maxArea;
- Source = primary or grey literature from which data were obtained (see Data source publications);
- Qual = Integer score designating data quality in likelihood of representing a full census for each fragment (1 = atlas or field confirmed atlas; 2 = multiple survey methods or collation of multiple field visits; 3 = single field survey effort adjusted 4. single survey no effort adjustment or validation or multiple surveys without effort adjustment to patch size [or precise methods unknown]);
- database = source of data used (dd = Deane et al (2024), fs = fragSAD).
R Script
scr_R_code_Archive.R: an R script (text) file for importing the other objects in the repo and running the results
R model object
topmodall: an R object of class 'brmsfit', requiring the R package 'brms' to be loaded into an active R session. Can be imported using code in the R script for inspection or loaded into any active R session using the code 'load(file=<file.path>/topmodall)', where <file.path> is replaced by the location of the file.
Sharing/Access information
Data was derived from the following sources:
- Supporting information for Deane et al (2024)
- FragSAD database
Code/Software
scr_R_code_Archive.R = R file including code to reproduce the work flow.
topmodall = an R object of class 'brmsfit' which must be imported into an R environment to access.
The data are collated from two different databases. The first of these was a subset (78/202) of the database collated from the literature on discrete metacommunities (islands, habitat islands and fragments), with the origins and primary sources for the data described in Deane (2022) and Deane et al. (2024).
To increase sample size, we added 60 datasets from the FragSAD database (Chase, et al. 2019), available from the Dryad data repository (https://doi.org/10.5061/dryad.595718c, August 2019 version, accessed 8 December 2020).
All datasets included metadata on broad taxonomic group (birds, invertebrates, non-avian vertebrates and plants), fragment type (‘forest’, ‘grassland’, or ‘island’, respectively forest or woodland fragments within a terrestrial matrix, grass/shrub-dominated fragments within a terrestrial matrix, and forest habitat fragments isolated by water due to reservoir creation), and a four-level categorical indicator of survey effort (Appendix S1; Deane 2022). In total, 138 datasets were available for modelling individual landscapes and for meta-regression across habitats.
