This README.md file was generated on 2022-03-07 by Emilio Berti GENERAL INFORMATION 1. Title of Dataset: 2. Author Information A. Principal Investigator Contact Information Name: Ulrich Brose Institution: German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig Address: Puschstrasse 4, 04103 Leipzig, Germany Email: ulrich.brose@idiv.de B. Associate or Co-investigator Contact Information Name: Emilio Berti Institution: German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig Address: Puschstrasse 4, 04103 Leipzig, Germany Email: emilio.berti@idiv.de C. Alternate Contact Information Name: Remo Ryser Institution: German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig Address: Puschstrasse 4, 04103 Leipzig, Germany Email: remo.ryser@idiv.de 3. Date of data collection: 2008-2011. 4. Geographic location of data collection (coordinates are in long-lat degrees): A. Adirondack lakes, NY, USA, bbox = {'xmin': -75.07, 'ymin': 43.53, 'xmax': -73.93, 'ymax': 44.51} B. Exploratory soil food webs, Germany, bbox = {'xmin': 9.23, 'ymin': 48.37, 'xmax': 13.93, 'ymax': 53.19} 5. Information about funding sources that supported the collection of the data: A. Deutsche Forschungsgemeinschaft, Award: FOR 2716 B. Deutsche Forschungsgemeinschaft, Award: RTG 2010 C. Deutsche Forschungsgemeinschaft, Award: FZT 118 SHARING/ACCESS INFORMATION 1. Licenses/restrictions placed on the data: Open (CC BY 4.0). 2. Was data derived from another source? yes A. GlobAL daTabasE of traits and food Web Architecture (GATEWAy) version 1.0. https://idata.idiv.de/ddm/Data/ShowData/283?version=3. 3. Recommended citation for this dataset: Bauer, Barbara et al. (2022), Biotic filtering by species’ interactions constrains food-web variability across spatial and abiotic gradients, Dryad, Dataset, https://doi.org/10.5061/dryad.2280gb5tw. DATA & FILE OVERVIEW 1. File List: A. distances.csv: table with calculated spatial, environmental, species, and food-web distances/dissimilarity metrics. B. bootstrap.coefs.csv: table with results from bootstrap sensitivity analyses. C. GATEWAy.csv: raw subset of the original food-web dataset. See also https://idata.idiv.de/ddm/Data/ShowData/283?version=3. D. simulations.R: R script to perform the filtering simulations, i.e. random, resource filtering, and limiting similarity filtering. E. SEM_PCA.R: R script to perform structural equation model (SEM) on distances/dissimilarity metrics. F. SEM_bootstrap.R: R script to perform SEM sensitivity analyses, G. sampling-error.R: R script to perform the Monte Carlo simulations to assess the influence of sampling error on analyses results. H. simulations.csv: table containing the results from simulations.R, i.e. the food-web dissimilarities computed for empirical, random, resource filtering, and limiting similiarity filtering. I. metaweb-soils.RData: RData object to load into R the Exploratory Soil metaweb. L. metaweb-lakes.RData: RData object to load into R the Adirondack Lakes metaweb. M. ab_norm_Biodiversity_Exploratory_soil_food_webs.RData: RData object to load into R the environmental variables for the Exploratory Soil food-webs. N. ab_norm_Adirondack lakes.RData: RData object to load into R the environmental variables for the Adirondack Lakes food-webs. O. Tes.csv: pairwise differences between the network metrics of food-webs, containing the difference value (Te) for each metric (foodwebprop) and the number of shared species between two patches (num_shared_spec). P. sampling-error.csv: raw results from the Monte Carlo simulations performed in sampling-error.R. Q. FoodwebProp_soil.csv: raw values of network metrics for the Exploratory Soil food-webs. R. FoodwebProp_soil.csv: raw values of network metrics for the Adirondack Lakes food-webs. 2. Relationship between files, if important: SEM_PCA.R contains the script to run the main analysis. It is ready to go, i.e. all necessary data tables have already been created. The script simulations.csv can take several hours/days to run, depending on the machine; if you want to skip this step, you can load simulations.csv, which contains results from the script. If you want to run simulations.R, first create a directory called 'plots', where all intermediate R data objects (.rds) and figures will be saved. 3. Additional related data collected that was not included in the current data package: none 4. Are there multiple versions of the dataset? no METHODOLOGICAL INFORMATION 1. Description of methods used for collection/generation of data: see https://idata.idiv.de/ddm/Data/ShowData/283?version=3 for sampling details and metadata. 2. Methods for processing the data: A. Data from GATEWAy database was subsetted to keep only Exploratory Soil and Adirondack Lakes food-webs. B. Spatial distance was calculated as the great-circle distance on long-lat degrees. C. Environmental distance was calculated as the Euclidean distance between two patches in PCA space. D. Species dissimilarity was obtained as the Jaccard index. C. Food-web dissimilarity was calculated as teh Euclidean distance between two patches in PCA space. 3. Instrument- or software-specific information needed to interpret the data: R, with pacakges lavaan, mice, igraph, and vegan. 4. People involved with sample collection, processing, analysis and/or submission: A. Ulrich Brose compiled the original food-web data source (GATEWAy database). B. Barbara Bauer and Emilio Berti processed the raw data to obtain distances/dissimilarity metrics. C. Emilio Berti, Remo Ryser, and Benjamin Rosenbaum performed the statistical analyses. DATA-SPECIFIC INFORMATION FOR: distances.csv 1. Number of variables: 6 2. Number of rows: 2341 3. Variable List: site2: name of patch #2 site1: name of patch #1 spatial: spatial distance environmental: environmental distance foodweb: food-web dissimilarity species: species dissimilarity 4. Missing data codes: NA DATA-SPECIFIC INFORMATION FOR: FoodwebProp_lake.csv 1. Number of variables: 21 2. Number of rows: 50 3. Variable List: food-web network metrics; see associated article main text for full details. 4. Missing data codes: NA DATA-SPECIFIC INFORMATION FOR: FoodwebProp_soil.csv 1. Number of variables: 21 2. Number of rows: 49 3. Variable List: food-web network metrics; see associated article main text for full details. 4. Missing data codes: NA DATA-SPECIFIC INFORMATION FOR: GATEWAy.csv 1. Number of variables: 46 2. Number of cases/rows: 101692 3. Variable List: see the original archive and associated metadata at: https://idata.idiv.de/ddm/Data/ShowData/283?version=3. 4. Missing data codes: -9999 DATA-SPECIFIC INFORMATION FOR: sampling-error.csv 1. Number of variables: 8 2. Number of rows: 49020 3. Variable List: N: number of simulated false negative sampling errors. N_frac: relative number of simulated sampling errors, calculated as N / (species richness). web1: name of food-web #1. web2: name of food-web #2. delta_fw: difference in averaged food-web metrics between the two food-webs. delta_sp: difference in species dissimilarity between the two food-webs. what: if web1 == web2 (diagonal) or web1 != web2 (off-diagonal). plot: ecosystem type (lakes/soil) 4. Missing data codes: NA 5. Specialized formats or other abbreviations used: the term diagonal and off-diagonal refer to a dissimilarity matrix where each food-web is compared to all others. The diagonal of this matrix is the difference between a food-web and itself. As we simulated sampling errors, this can also be different from zero. DATA-SPECIFIC INFORMATION FOR: simulations.csv 1. Number of variables: 8 2. Number of cases/rows: 27,249,559 3. Variable List: foodwebprop: name of the food-web network metric. value_plot1: values of networm metric in food-web #1. value_plot2: values of networm metric in food-web #2. Tm: absolute difference | value_plot2 - value_plot1 |. plot1: name of food-web #1. plot2: name of food-web #2. filtering: type of filtering process involved (no filtering, resource filtering, limiting similariy filtering). web_type: food-web generative model involved. 4. Missing data codes: NA 5. Specialized formats or other abbreviations used: resource filtering is called 'biotic filtering' and limiting similarity filtering 'limiting similarity'. # Methods This archive contains the data and scripts necessary to reproduce the statistical analyses of the study Biotic filtering by species’ interactions constrains food-web variability across spatial and abiotic gradients. Food-web and environmental data were obtained from the GATEWAy (1.0) database: . Food-web topolgical metrics were calculated using the script *allMetBB.R*. Jaccard dissimilarity was calculated using the R package *vegan*. Structural Equation Models (SEMs) were performed using the R package *lavaan*. This archive contains also the script sampling-error.R, which reproduce potential sampling errors in species composition through a Monte Carlo approach. The data file samling-error.csv contains the raw data generated from the Monte Carlo and can be loaded to reproduce the sensitivity analyses. The script SEM_bootstrap.R reproduces the bootstrap analyses for the SEMs. # Usage Data files have *.csv* extensions or *.RData* extension. All R scripts have *.R* extension To save all plots, first create the subfolder *plots* in this directory.