Data from: Seascape ecology of juvenile gadoid nursery areas
Data files
Sep 04, 2025 version files 683.54 KB
-
gadoid.metadata.2.csv
618.21 KB
-
Gadoids.rmd
58.39 KB
-
README.md
6.94 KB
Abstract
Availability of juvenile fish habitat provision can impact recruitment. This study focused on identifying which environmental variables characterise the juvenile habitats of three commercially important gadoid species: Atlantic cod (Gadus morhua), haddock (Melanogrammus aeglefinus), and whiting (Merlangius merlangus). Stereo baited remote underwater video surveys were conducted in the South Arran Marine Protected Area between 2013 and 2019 to collect presence/absence data on juvenile gadoids (> 20 mm < 120 mm) and demersal and epibenthic communities. Data were analysed using binomial generalised additive mixed models. The results revealed spatial segregation among species, each favouring distinct habitats. Predictive modelling suggests a substantial increase in presence probability from 0.25 to 0.75 as the Inverse Simpson's Diversity Index increases, suggesting that biodiversity appears to be associated with species distribution. Boundary regions between seabed types were associated with variation in species distribution, underlining the importance of seascape heterogeneity. This study underscores the importance of conserving and restoring benthic and epibenthic biodiversity across spatially heterogeneous landscapes. Consequently, reducing benthic pressures could promote early survival for these species, thereby supporting broader ecosystem health and fisheries management goals.
This dataset contains presence/absence data and environmental variables for three commercially important juvenile gadoid species: Atlantic cod (Gadus morhua), haddock (Melanogrammus aeglefinus), and whiting (Merlangius merlangus), collected using stereo baited remote underwater video (SBRUV) surveys in the South Arran Marine Protected Area, Scotland, between 2013 and 2019. The data were used to model species distribution using binomial generalised additive mixed models and predict nursery habitat suitability. Results revealed that biodiversity was the only variable consistently associated with presence across all three species, with each species showing distinct habitat preferences related to depth, seabed composition, and proximity to habitat patch edges.
Description of the data and file structure
The dataset consists of two main data files and one R Markdown analysis script:
gadoid.metadata.2.csv - Primary dataset containing species presence/absence data and environmental variables for 488 SBRUV deployment sites. Each row represents one SBRUV deployment with associated environmental measurements and species observations.
Gadoids.rmd - R Markdown file containing all statistical analyses, model fitting, and figure generation code used in the manuscript.
Variable descriptions for gadoid.metadata.2.csv:
Species presence data:
- G_morhua: Presence (1) or absence (0) of juvenile Atlantic cod (Gadus morhua) 20-120 mm
- M_aeglefinus: Presence (1) or absence (0) of juvenile haddock (Melanogrammus aeglefinus) 20-120 mm
- M_merlangus: Presence (1) or absence (0) of juvenile whiting (Merlangius merlangus) 20-120 mm
Spatial and temporal variables:
- Latitude: Decimal degrees (WGS84)
- Longitude: Decimal degrees (WGS84)
- Year: Survey year (2013, 2014, 2018, 2019)
- Month: Survey month (6-9, corresponding to June-September)
- MPA_Zone: Marine Protected Area management zone (1-5), used as random effect in models
- Depth_m: Water depth at deployment site (metres)
- Distance_to_shore_km: Euclidean distance from deployment site to nearest shoreline (kilometres)
Seabed composition (proportional coverage 0-1):
- Mud: Proportional coverage of mud substrate
- Sand: Proportional coverage of sand substrate
- Gravel: Proportional coverage of gravel substrate (includes both dead and live maerl)
- Pebble: Proportional coverage of pebble substrate
- Algae: Proportional coverage of attached macroalgae of all types
Physical environment:
- Mean_current_velocity_ms: Mean current velocity (metres per second)
- TRI: Terrain Ruggedness Index, elevation differences between grid points (metres)
Biodiversity:
- Inverse_Simpsons_Diversity: Inverse Simpson's diversity index calculated from epibenthic and demersal fish and macroinvertebrate species counts (1/D = N(N-1)/∑(n/N))
Distance to habitat patch edges (metres):
- Distance_to_sand_edge_m: Euclidean distance to edge of sand patches (positive = within patch, negative = outside patch)
- Distance_to_mud_edge_m: Euclidean distance to edge of mud patches
- Distance_to_algae_edge_m: Euclidean distance to edge of algae patches
- Distance_to_gravel_edge_m: Euclidean distance to edge of gravel patches
Missing data codes: NA indicates missing or unavailable data for that variable at that site.
Data processing notes:
- Seabed composition was assessed visually from SBRUV footage and classified using standardised categories
- Distance measurements use positive values for locations within habitat patches and negative values for locations outside patches
- Focal gadoid species analysis included only juveniles > 20 < 120 mm to confirm 0-group status
- Survey sites deeper than 40m were excluded to maintain consistent spatial coverage
Sharing/Access information
Data was derived from the following sources:
- SBRUV survey data: University of Glasgow and Marine Scotland Science collaborative surveys
- Seabed mapping: NatureScot and Scottish Environmental Protection Agency drop-down video surveys
- Bathymetry: 1 arc-second resolution data
- Current velocity data: Modelled mean current data from Sabatino et al. (2016)
- Phylogenetic data: Mammalian supertree from VertLife database (where applicable)
Links to related datasets:
- Seascape mapping data and methods: https://github.com/NWMilne/Seascape_mapping
Code/Software
The analysis was conducted entirely in R version 4.4.1. The R Markdown file (Gadoids.rmd) contains all code for:
- Data loading and cleaning: Initial data processing and quality control
- Statistical analysis: Binomial generalised additive mixed models (GAMMs) using the
gamm4package - Model validation: Area under curve (AUC), percent correctly classified (PCC), and confusion matrix calculations using
PresenceAbsenceandcaretpackages - Spatial prediction: Species distribution mapping using 100m grid predictions
- Figure generation: All manuscript figures using
ggplot2
Required R packages:
gamm4: Generalised additive mixed modelsPresenceAbsence: Model performance metricscaret: Classification and regression trainingggplot2: Graphics and visualizationdismo: Species distribution modelling (for MESS analysis)automap: Spatial interpolation (for error mapping)
Key methodological details:
- Survey methodology: SBRUV deployments with 60-minute recording time, 3-minute settlement period
- Camera setup: Paired Canon HF G25 cameras with 8° inward angle, 58 cm basal separation
- Species identification: Stereo measurements for size confirmation (20-120 mm for 0-group status)
- Spatial autocorrelation: Addressed using MPA zones as random effects in GAMMs
- Model selection: Backward stepwise selection using AIC
External data dependencies:
Several raster and shapefile inputs used in this analysis were generated in our previous seascape mapping project. Instructions and metadata for downloading these files can be found in the Seascape_mapping README. These files should be placed in a data/ subdirectory to run the full analysis.
Reproducibility notes:
To reproduce the analysis:
- Download the CSV data file
- Install required R packages listed above
- Download external raster/shapefile dependencies from the Seascape_mapping repository
- Run the R Markdown file to reproduce all analyses and figures
The analysis workflow progresses through data preparation, collinearity checking, model fitting with backward stepwise selection, model validation, spatial prediction mapping, and figure generation. All code is annotated with explanatory comments throughout.
