Data and code from: Urban-driven homogenization of aquatic subsidy size structure cascades to riparian predator communities
Data files
Mar 10, 2026 version files 261.04 KB
-
File_3.zip
250.49 KB
-
README.md
10.56 KB
Abstract
The export of emergent aquatic insects is a critical energy subsidy for terrestrial food webs. While urbanization is known to alter stream communities, its effects on the size structure of these subsidies and the consequences for riparian predators remain poorly understood. This dataset was generated to investigate how impervious land cover affects the body-size distribution of emergent aquatic insects and, in turn, the community structure and diet of riparian spiders along two urban streams in Québec, Canada. The data package contains comprehensive information linking environmental drivers to community and trophic responses. The data package contains comprehensive information linking environmental drivers to community and trophic responses, organized into several files. It includes detailed data on emergent aquatic insect communities, featuring family-level identification, abundance counts, and individual body length measurements (mm) for thousands of specimens collected from floating emergence traps. The dataset also provides abundance counts of riparian spider families (e.g., Tetragnathidae) and other terrestrial arthropods collected using the beating sheet method. These biological data are contextualized by site-specific watershed characteristics, such as the proportion of impervious cover and distance from upstream derived from GIS analysis, alongside a suite of in-situ physicochemical water quality measurements (temperature, pH, conductivity, turbidity, and continuous dissolved oxygen). Finally, trophic connections are detailed through raw stable isotope values (δ13C and δ15N) for the primary consumer (Tetragnatha spiders) and their potential aquatic and terrestrial food sources.
Dataset DOI: 10.5061/dryad.v6wwpzh8z
Description of the data and file structure
This dataset (File_3.zip) contains the data and R code required to replicate the analyses, testing the hypothesis that urbanization, measured by impervious cover, alters the body-size structure of emergent aquatic insect subsidies, which in turn affects the diet and community structure of riparian spiders. Data were collected from 20 sites distributed along two rivers (Milette and Sables) in Quebec, Canada, representing a gradient of watershed urbanization.
The geospatial, environmental, and community data include: watershed-level land-use metrics (e.g., proportion of impervious cover, distance from upstream); discrete and continuous site-level physicochemical water quality measurements (e.g., temperature, conductivity, dissolved oxygen); family-level abundance counts for riparian spider communities; and family-level identification, abundance, and individual body length measurements for sampled emergent aquatic insect specimens. Trophic data consist of raw stable isotope values (δ13C and δ15N) for Tetragnatha spiders and their potential aquatic and terrestrial food sources. The dataset also includes several processed summary files that provide site-level metrics (e.g., slopes, mean body size) and the outputs of Bayesian diet mixing models (MixSIAR).
Analysis of the emergent aquatic insect community data revealed that increasing impervious cover was associated with a strong shift toward smaller-bodied individuals and a contraction of the overall body-size range, while total exported biomass remained unchanged. Modeling of spider diet, based on the stable isotope data, showed that spiders in more urbanized sites exhibited a higher reliance on these altered aquatic prey subsidies. Furthermore, multivariate analysis of the spider community structure indicated shifts in composition correlated with both land use and the characteristics of the available insect prey.
Files and variables
Date of data collection
- Emergent Aquatic Insects : May 20 and June 7, 2024.
- Terrestrial Arthropods: May 20 to June 28, 2024.
- Water Quality : May 20 and June 7, 2024.
Folder content and structure
The main directory (name: Files) is subdivided into five folders. The first four correspond to data types:
- 01_water_quality_data contains the physicochemical datasets;
- 02_Land_use_data contains the dataset with spatial characteristics;
- 03_Emergent_insects_data contains the abundance, mass, and identification data for emergent insects;
- and 04_Isotopes contains the stable isotope data (δ13C and δ15N) and processed summary files (e.g., MixSIAR outputs). All datasets are in
.csvformat. - The fifth folder, 05_script, contains the single R script (
.Rfile) used to perform all analyses described in the manuscript.
Variables description
Physicochemistry.csv
This file contains discrete physicochemical water quality measurements. Each row represents one sampling event at a given site.
VARIABLE DESCRIPTIONS:
- River (Character): River name
- site (Integer): Site number
- Period (Integer): Sampling period identifier
- date (Character): Sampling date (YYYY-MM-DD)
- Temp 1, Temp 2, Temp 3 (Numeric): Replicate measurements of water temperature. Unit: degrees Celsius
- pH1, pH2, pH3 (Numeric): Replicate measurements of pH
- conductivity (Numeric): Electrical conductivity. Unit: µS/cm
- turbidity 1, turbidity 2, turbidity 3 (Numeric): Replicate measurements of turbidity. Unit: NTU (assumed)
- chloro 1, chloro 2, chloro 3 (Numeric): Replicate measurements of chlorophyll-a. Unit: µg/L
- cloud (Integer): Cloud cover percentage
- Storm (Categorical): Indicates if a storm occurred ("yes" or "no").
size_spectrum_data.csv
This file contains individual length measurements for thousands of emergent aquatic insect specimens, with biomass calculated via allometric equations.
VARIABLE DESCRIPTIONS:
- Taxon (Character): Taxonomic identification of the invertebrate (usually family)
- River (Character): River where the sample was collected
- Site (Integer): Site number
- Length (Numeric): Measured body length of the individual. The unit is specified in the "Units" column
- Units (Character): Unit of the Length measurement (e.g., "µm")
- Group (Character): Higher taxonomic group (e.g., "Nematocera")
- Length_mm (Numeric): Body length converted to millimeters
- a, b (Numeric): Parameters of the length-biomass allometric equation
- Biomass (Numeric): Calculated biomass for the individual. Unit: mg (dry mass).
land_use_mil_sab.csv
This file contains land-use characteristics for the watershed upstream of each sampling site, derived from GIS analysis
VARIABLE DESCRIPTIONS:
- river (Character): River name
- site (Integer): Site number
- proportion_urb (Numeric): Proportion of urban/impervious area in the local catchment
- dist_from_upstream (Numeric): Distance of the site from the headwaters, measured along the river network. Unit: meters
- Note: Other columns represent intermediate calculations for cumulative vs local land use area (m2)
Battage_2024.csv
This file contains the raw abundance counts of riparian arthropods collected using the beating sheet method.
VARIABLE DESCRIPTIONS:
- ID (Character): Unique identifier for the sampling event
- River (Character): River name
- Site (Integer): Site number
- tetragnathidae, thomiscidae, etc. (Integer): Abundance count (number of individuals) for the corresponding arthropod family collected during a sampling event.
Final_isotopes_data.csv
This file contains the raw stable isotope data (δ13C and δ15N) for all collected samples (consumers and sources). Delimiter: Semicolon (;) Decimal: Comma (,)
VARIABLE DESCRIPTIONS:
- Sample_ID (Character): Unique identifier for each analyzed sample
- dNormC13 (Numeric): Normalized delta-13C (δ13C) value. Unit: per mil (‰)
- dNNorm15 (Numeric): Normalized delta-15N (δ15N) value. Unit: per mil (‰)
- %C, %N (Numeric): Percentage of Carbon and Nitrogen in the sample
- Ratio CN (Numeric): The Carbon to Nitrogen mass ratio
- Taxa (Character): Taxonomic identification of the sample
- group (Character): General group for the sample
- river, site (Character, Integer): River and site of collection
- habitat (Categorical): Habitat of the sample source ("terrestrial" or "aquatic")
- type (Character): Functional or source type of the sample (e.g., "Aquatic prey", "Terrestrial herbivore")
Diet_AND_sizemetrics_data.csv
This is a summary dataset that combines the outputs from the MixSIAR diet models with site-level invertebrate size spectrum metrics and land-use proportions. It serves as the primary data file for the GLM analyses.
VARIABLE DESCRIPTIONS:
- River, Site (Character, Integer): River name and site number
- Source (Character): Source group from the diet model ("Aquatic" or "Terrestrial")
- modt (Character): Identifier for the MixSIAR model type
- Mean, SD, p2.5 to p97.5 (Numeric): The posterior mean, standard deviation, and 95% credible intervals for the estimated proportion of a given source in the spider diet
- insect_biomass (Numeric): Total emergent aquatic insect biomass at the site. Unit: mg
- insect_abundance (Integer): Total emergent aquatic insect abundance at the site
- propor_urb (Numeric): Proportion of urban/impervious land use in the local catchment
- Site_Label (Character): A unique label for each site
- SS_slope (Numeric): The slope of the biomass size spectrum at each site
- Mean_length (Numeric): The mean length of all emergent aquatic insects at a site. Unit: mm
- Size_range (Numeric): The size range of emergent aquatic insects at a site (max length - min length). Unit: mm
- dist_from_upstream (Numeric): Distance of the site from the headwaters. Unit: meters
Code/software
Software Requirements
All data files are in .csv format and can be viewed using any standard spreadsheet software. All analyses were performed using the R statistical programming language (version 4.0 or higher).
Analysis Script
This repository includes the complete R script (script.R) used to perform all data processing, statistical modeling, and figure generation described in the manuscript. The script is heavily commented to ensure full reproducibility.
All data can be viewed using any standard spreadsheet software. The analyses were performed using the R statistical programming language (version 4.0 or higher) with RStudio.
The provided R script contains the complete code to reproduce all statistical analyses and figures presented in the associated manuscript. The script is organized into 11 sequential sections, each corresponding to a specific analysis or figure. To run the script, users will need to have the following key open-source R packages installed:
- Data Manipulation and Plotting:
dplyr,tidyr,ggplot2 - Multivariate Community Ecology:
vegan(for PCA and db-RDA) - Isotope Mixing Models:
MixSIAR(note: the script is adapted for an older version, see script comments) - Size Spectra Analysis:
sizeSpectra,poweRlaw - Generalized Linear Models (GLMs):
betareg(for beta regression),quantreg(for quantile regression) - Model Diagnostics:
car(for VIF)
Workflow Description: The script is designed to be run from top to bottom. It begins with a comprehensive header that serves as a table of contents. Each of the 11 sections is self-contained in terms of library loading and performs a distinct part of the analysis. The general workflow is as follows:
- Data Loading and Preparation: Each section begins by loading the necessary raw data file(s) from the
/data/subdirectory. Data is cleaned, transformed, and merged into an analysis-ready format usingdplyr. - Statistical Modeling: The prepared data is then used to fit the relevant statistical model for that section (e.g., PCA, db-RDA, linear models, GLMs, or MixSIAR).
- Model Validation: For key analyses, the script includes steps for model validation, such as checking for multicollinearity (VIF) or running permutation tests.
- Visualization: Publication-quality figures corresponding to the manuscript are generated using
ggplot2. - Output: Model summaries and statistical test results are printed to the console.
