Data and code from: Biological invasions disrupt the relationship between size spectrum and trophic interactions in freshwater fish communities
Data files
May 08, 2026 version files 1.78 MB
-
Data_Env.xlsx
16.26 KB
-
Data_GPS.xlsx
21.29 KB
-
Data_species.xlsx
14.22 KB
-
Fishbase_IsoF.csv
1.69 MB
-
JAE_script_Marinetal2026.R
34.10 KB
-
README.md
8.33 KB
Abstract
The size spectrum, which describes the relationship between abundance (or biomass) and body size, is an ataxic approach that can provide insights into energy fluxes across trophic levels. However, anthropogenic perturbations can alter the relationship between body size and trophic position, and therefore the predator-prey mass ratio (PPMR). In this study, we used body size distribution and stable isotope analyses to investigate the relationship between size spectrum and the PPMR in lake fish communities across various eutrophication and invasion levels. Our results revealed that, although size spectrum and PPMR co-varied (i.e., resulting in a flatter size spectrum when PPMR was low), this effect was modulated by the level of biological invasion in the community. This was likely caused by differences in trophic niche between native and non-native species: small non-native species exhibited higher trophic positions than small native species, while large non-native species can have lower trophic positions than their native counterparts. These findings suggest that the relationship between size structure and trophic interactions in lake fish communities may be blurred by anthropogenic perturbations, challenging core assumptions of size-based ecology in estimating energy fluxes within freshwater food webs.
Dataset DOI: 10.5061/dryad.69p8cz9jd
Description of the data and file structure
This repository contains the script and associated files used to reproduce the results and figures of the manuscript:
“Biological invasions disrupt the relationship between size spectrum and trophic interactions in freshwater fish communities” from Marin et al. (2026)
Please contact me on research gate (@Valentin Marin) for any question
The script is divided into two main sections:
1. Determining and visualizing size structure of communities (spectra and PPMR)
2. Modeling these variables with environmental context (eutrophication and invasion)
Each section is detailed below.
(1) Determining and Visualizing Size Structure of Communities
This section uses the main fish data, including body sizes and stable isotope analysis (SIA) data (Fishbase_IsoF.csv).
1.1 Determining Trophic Position (TP) of Individuals (Lines 25–93)
- TP is calculated using the specific baseline of each lake (in δ15N‰, Fishbase_IsoF$Baseline) following Equation 5 from the manuscript (line 35).
- Subsequent lines detail the calculation process.
1.2 Determining Size Spectrum of Each Community (Lines 98–238)
- Size spectra are computed using two methods:
- The binned method by Marin et al. (2023) (lines 98–160)
- The maximum likelihood estimation (MLE) method by Edward (2019) (lines 164–202)
- Full references are available in the manuscript.
- Functions are applied to fish mass data (Fishbase_IsoF$Mass in grams) (lines 219–231).
1.3 Determining Predator-Prey Mass Ratio (PPMR) of Each Community (Lines 220–281)
- PPMR is calculated based on body size (Fishbase_IsoF$Mass) and TP values from section 1.1.
- Size classes are defined first, then mean TP is determined (lines 250–303).
1.4 Visualizing Size Spectrum and PPMR (Lines 325–443)
- Figure 1, showing variation in size spectrum and PPMR, is generated step-by-step using the ggplot2 package.
(2) Environmental Data and Modeling
2.1 Loading Lake Environmental Variables (Lines 447–522)
- Invasion level is defined using the species status data (Data_species.xlsx) indicating native or non-native status ($status).
- Relative abundances are calculated (lines 455–477) and merged with the main data (DataSize), compiling size spectrum and PPMR per lake (line 483).
- Eutrophication data are loaded from Data_Env.xlsx, which includes Secchi depth (cm), chlorophyll-a concentration (mg/L), and total phosphorus (µg/L). These are used to compute the Trophic State Index (Equations 1–4 in the manuscript; lines 512–516).
- This data is added to DataSize at line 521 to create the final dataset (DataFinal) for statistical analysis.
2.2 Checking Spatial Correlation of Data (Lines 525–563)
- Spatial coordinates (longitude and latitude) for each lake are loaded from Data_GPS.
- Mantel tests (lines 532–546) and Moran’s I tests (lines 549–563) are performed to assess spatial autocorrelation.
2.3 Stepwise Selection of Size Spectrum Slope Models (Lines 543–639)
- A full model (L601) is reduced by removing non-significant interactions of predictors on the binned size spectrum slope (up to L622).
- The effect of predictors on the linear mixed-effects model slope (modfullSS3,L622) is tested and verified on SSslopes extracted from LME method (lines 626-630).
- Significant interactions are plotted using base R (lines 634–676) and ggplot2 (lines 684–693), reproducing Figure 3 from the manuscript.
2.4 Differences in Trophic Position Between Native and Non-native Species (From Line 698)
- Two datasets are created for analysis and plotting:
- dataTR (lines 704–724)
- subplot (lines 729–739)
- These allow assessment of individual TP by size and species status (native vs. non-native).
- A mixed-effects model corresponding to Table 2 in the manuscript is run (line 741).
- Figure 4a (L754) shows TP for all individual body sizes following model tests, and Figure 4b (L769) summarizes TP by species.
- Finally, Figure 5 (p_final) was created from a data including correlation values (stats_species), then cleaned and sorted by species. The detailed procedure to obtain the facet plot by including line and correlations coefficients are detailed lines849-895 by using ggplot.
Note: an extra code to generate pie charts (as in figure 2 and S1) is provided lines898-942
Files and variables
File: JAE_script_Marinetal2026.R
Description: Rscript to generate results and figures from datasets. This R script analyzes fish community structure across lakes by calculating trophic positions, size spectra, and predator–prey mass ratios (PPMR), visualizing their distributions, and linking them to environmental and invasion variables using mixed-effects models.
File: Data_species.xlsx
Description: Used to determine the relative percentage of non native species
Variables
- "Common name": common name of fish specie
- "Latin name": latine name of fish species
- "Species Status": native in south-ouest France or not
- "% Occurence": relative ocurrence in the dataset (%Individuals)
- "% Biomass": relative biomass in the dataset
File: Data_GPS.xlsx
Description: Used to determine the spatial correlation of variables
Variables
- "Lake": Lake abbreviation
- "GPS_North": lattitude coordinate
- "GPS_Est": longitude coordinate
File: Fishbase_IsoF.csv
Description: Main data containing information of fish individuals. Any NA represents missing data.
Variables:
- "Lake": Lake abbreviation
- Gear type: sampling gear( "EPA" for electrofishing or "Gillnet")
- Gear_ID: EPA ID or Gillnet ID
- Habitat: Littoral or pelagic sampling
- Species: Abreviation of sampled specie
- Size: fish size in mm
- Mass: estimated mass in g
- Lifestage: Lifestage of the specie (Young of the year (YOY), juvenile (JUV) or ADULT)
- Genetic_ID: Name of the sample for internal genetic analysis
- Isotop_ID: Name of the sample for SIA analysis
- Observation: any information relating to sampling
- SIA_analyzed: 0 or 1 (SIA data available)
- SEX: sex of individual if known (male or female)
- CLIP_SIA: 0 or 1 (fin clip sampled yes/no)
- Statut: status of the sample
- Weight(mg): Weight of the sample analysed for SIA
- N2Amp: Nitrogen peak amplitude (15N/14N)
- %N: Relative nitrogen mass (%)
- N15: 15N vs Air (delta)
- CO2 Amp: CO2 peak amplitude (13C/12C)
- %C: relative carbon mass (%)
- C13: 13C vs air (delta)
- C/N: Carbon-Nitrogen ratio
- Functentities: Specie_lifestage of indiviual
- Baseline: mean N15 value of primary consumers
File: Data_Env.xlsx
Description: Environnemental data used to calculate Trophic state index (Secchi depth, Pelagic Chlorophyll, Total Phosphorus). Three different locations per lake are sampled; if only one location is sampled, 'n/a' is reported in the location ID. All other measured physicochemical parameters are also reported and may contain 'n/a'.
Variables:
- "Year": year of the sampling
- "Lake": lake abbreviation
- "Month": month of the sampling
- "Localisation": Internal code for sampling location
- "Secchi": Secchi depth in cm
- "Turbi_YSI_Ftu": turbidity in FTU
- "Pel_Tot_Algae_1": first Pelagic Chlorophyll A sampling (mg/L)
- "Pel_Tot_Algae_2": second Pelagic Chlorophyll A sampling (mg/L)
- "Pel_Tot_Algae_3": third Pelagic Chlorophyll A sampling (mg/L)
- "Pel_Tot_Algae_mean" mean of Pelagic Chlorophyll A sampling (mg/L)
- "Pel_Cyano_1": first Pelagic cyanobacteria sampling (mg/L)
- "Pel_Cyano_1": second Pelagic cyanobacteria sampling (mg/L)
- "Pel_Cyano_1": third Pelagic cyanobacteria sampling (mg/L)
- "Pel_Cyano_1_mean": mean of cyanobacteria A sampling (mg/L)
- "COT_(mg C/L)" :Total Organig carbon (in mg C/L)
- "Ptot_(microg C/L)": Total Phosphorus (in microg P/L)
- "Ntot_(microg C/L)": Total Nitrogen (in microg N/L)
Code/software
Analysis realized with the RStudio software (version Version 2024.04.1+)
Access information
Other publicly accessible locations of the data:
- NA
Data was derived from the following sources:
- Field data
