Data from: On the brink: Mapping the last strongholds of the critically endangered flapper skate (Dipturus intermedius)
Data files
Apr 27, 2026 version files 3.61 MB
-
env_ex.R
2.51 KB
-
environmentaldataplots.R
9.50 KB
-
exploration.R
9.15 KB
-
extra_inla_fun.R
20.26 KB
-
extract_datras.R
6.79 KB
-
fishing_data.R
11.39 KB
-
full_dataset.csv
2.14 MB
-
mesh_ggplot.R
4.34 KB
-
model_script.R
46.28 KB
-
pred.csv
1.35 MB
-
README.md
10.69 KB
Abstract
Aim: The flapper skate (Dipturus intermedius) is a Critically Endangered skate distributed throughout the NE Atlantic and requiring urgent conservation measures. Existing models of the flapper skate’s distribution are not detailed enough to inform management. The aim of this study was to develop more highly resolved predictions of the skate’s distribution across its range, building on existing studies to provide a comprehensive baseline for flapper skate presence. Location The NE Atlantic shelf region
Methods: A Bayesian spatial binomial GAMM was used to model the distribution of flapper skate across the NE Atlantic shelf. Following an exhaustive search of fisheries-independent DATRAS catch records, skate presence was modelled as a function of environmental covariates and AIS fishing pressure data.
Results: Skate presence was highest in coastal areas approximately 40–50 km from shore, where fishing pressure and benthic productivity were low. A smoother for the bathymetry variable indicated presence was highest at depths of 100–200 m. Regions with the highest predicted probability of occurrence included the north and west coasts of Scotland, and the west coast of Ireland near Counties Clare and Galway. In contrast, very low support was given for presence in the southern and central North Sea, likely reflecting historical population collapse, as well as in deeper offshore waters beyond the shelf.
Main conclusions: This study presents the first large-scale model of flapper skate presence across the NE Atlantic shelf that integrates both environmental and fishing pressure data, providing new baseline insights into habitat use in the North Sea and around Ireland. Three core regions of presence were identified, likely reflecting natural refugia from fishing and critical habitats (EFHs).
Future research should prioritise these strongholds, focusing on identifying critical habitats to support focused management strategies.
Dataset DOI: 10.5061/dryad.w0vt4b954
Description of the data and file structure
File: full_dataset.csv
These data represent haul information from four DATRAS ground fish surveys, which are openly available through the Database of Trawl Surveys (DATRAS) download portal, managed by International Council for Exploration of the Seas (ICES). For each haul, the presence or absence of the flapper skate (D. intermedius) is recorded.
Environmental predictor variables used in the habitat suitability models included: depth (m), distance from coast (m), mean bottom temperature (°C) and mean bottom current velocity (ms-1; Table 2). Depth was obtained from the General Bathymetric Chart of the Oceans, at a resolution of 15 arc-seconds. Monthly averages of bottom temperature, northward current velocity, and eastward current velocity were obtained from the EU Copernicus Marine Service Information as raster files, at a resolution of 0.111° × 0.067°. Since a dataset for bottom current velocity was not available, the velocity at the seafloor was calculated using a bathymetry mask. Distance from coast was calculated for each occurrence point using the ‘Join attributes by nearest’ tool in QGIS. A coastline shapefile was obtained from the European Environment Agency. A coastline shapefile was obtained from the European Environment Agency (European Environment Agency, 2015).
Variables
- survey
- ship
- quarter
- year
- haul_dur: minutes
- present: 1/0 binary code presence of skate in haul
- current: current speed (ms-1)
- dcoast: distance to coast (m)
- bath: bathymetry (m)
- btemp: bottom temperature (degrees Celsius)
- xm: longitude in metres (utm 29)
- ym: latitude in metres (utm 29)
- xkm: longitude in kilometres (utm 29)
- ykm: latitude in kilometres (utm 29)
- lat: latitude in degrees (wgs84)
- lon: longitude in degrees (wgs84)
- fishing_hours: hours per month
- pp_mean: benthic primary productivity (mmol.m-3)
File: pred.csv
An empty prediction matrix for model predictions across the study area. Location points were generated using the ‘Create Fishnet’ tool in ArcGIS Pro, with the survey data extent as the boundary. Environmental data were extracted for these points, and null values where the grid overlapped with land, were removed.
Variables
- lat: latitude in degrees (wgs84)
- lon: longitude in degrees (wgs84)
- xm: longitude in metres (utm 29)
- ym: latitude in metres (utm 29)
- xkm: longitude in kilometres (utm 29)
- ykm: latitude in kilometres (utm 29)
- dcoast: distance to coast (m)
- btemp: bottom temperature (degrees celsius)
- bath: bathymetry (m)
- current: current speed (ms-1)
- fishing_hours: hours per month
- pp_mean: benthic primary productivity (mmol.m-3)
File: extract_datras.R
Description: R script to extract and combine DATRAS survey HH (haul) data, as well as catch data (HL) for flapper skate across all surveys in the database.
File: env_ex.R
Description: R script to extract environmental raster information by survey coordinates and join to species occurrence and prediction dataframes.
File: fishing_data.R
Description: This script processes and analyses multi-year AIS fishing effort data by combining, cropping, averaging, rasterising, and extracting spatial fishing pressure raster information for your occurrence and prediction data frames.
File: environmentaldataplots.R
Description: R script to produce data exploration plots for variables, including boxplots for differences between groups (with statistical tests), and normality plots.
File: exploration.R
Description: R script for data exploration according to methods outlined in Zuur et al., 2010.
Zuur, A.F., Ieno, E.N. and Elphick, C.S., 2010. A protocol for data exploration to avoid common statistical problems. Methods in ecology and evolution, 1(1), pp.3-14.
File: model_script.R
Description: Script to build and run Bernoulli GLM, GAM and GAM + SPDE using the Bayesian Integrated Nested Laplace Approximation (INLA) approach. All model validation, prior sensitivity test + prediction steps are also included. Framework and base modelling code from Zuur et al., 2017.
Zuur, A.F., Ieno, E.N. and Saveliev, A.A., 2017. Spatial, temporal and spatial–temporal ecological data analysis with R-INLA. Highland Statistics Ltd, 1.
File: extra_inla_fun.R
Description: Helper file containing mesh and spatial conversion functions used in the model_script.R
inla.mesh2sp(mesh)
Converts aninla.meshobject into spatial objects (SpatialPolygonsDataFramefor triangles andSpatialPointsfor vertices), suitable for plotting or spatial analysis withsp.PlotField2(field, mesh, ContourMap, xlim, ylim, Add=FALSE, MyMain, ...)
Projects and plots a spatial field defined on an INLA mesh over a specified contour map bounding box.make_prediction_map(A_matrix, P_vector, remove_obs=FALSE)
Creates a data frame combining spatial coordinates and prediction values from INLA outputs.append_prediction_map(Existing_df, P_vector, remove_obs=FALSE)
Appends additional prediction vectors to an existing prediction data frame as a new column, useful for compiling model predictions from several model constructions or scenarios into one table for comparison.summarize_random_effects(model, effect_name)
Summarizes posterior means, SDs, credible intervals, and ranges for random effects in an INLA model.compute_metrics(cv_results, observed_response, threshold = 0.3)
Calculates classification metrics (sensitivity, specificity, precision, recall, F1, kappa) based on model predictions and observed outcomes.compute_metrics_sss(cv_results, observed_response, threshold_range = seq(0.1, 0.9, 0.01))
Computes classification metrics across a range of thresholds to identify the optimal threshold maximizing the sum of sensitivity and specificity (SSS).simulation_study(modelno, formula, covariates, data, n_simulations = 1000, savep = TRUE, threshold = 0.5)
Simulates binary outcomes from an INLA model and compares observed vs. simulated zero inflation using histograms, Z-scores, and chi-squared tests.
HighstatLibV13.R (Supplemental Information - Zenodo)
Description: Helper file containing additional functions from the Highland Statistics Ltd (Zuur et al., 2009)
Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM., 2009. Mixed effects models and extensions in ecology with R. Springer.
#Copyright Highland Statistics LTD, shared with permission.
mesh_ggplot.R
Description: This script will plot your INLA SPDE mesh that you have created in the model_script.R. Once you have created the 'mesh' object in your R workspace, run this to generate a publication-ready plot.
Code/software
Dependencies
INLA(for model fitting and mesh handling)sp(for spatial polygons and points)dplyr(for data manipulation)ggplot2(for visualization, required in simulation study plotting)fields(for image plotting, used inPlotField2)
The datasets are presented in a tabular csv format, which is readable by a number of software.
The analyses were carried out in open source R studio software, using R version 4.2.3 (and earlier versions). R scripts have been uploaded, and detail the necessary libraries and packages used. All scripts must be run step by step (supervised), to conduct the analysis. Annotations should indicate the purpose of each script and section.
The workflow is as follows:
NB: It is advised to create an R project file, place the data and scripts in a "/data" folder and adjust the file paths in the scripts.
- Data extraction
- Run extract_datras.r* to get survey datasets with HH (haul) and species (HL) information from DATRAS database
- Run env.ex.r to extract environmental information for survey haul locations and join with occurrence/prediction dataframes
- Run *fishing_data.R to process and extract fishing hours data and join with occurrence/prediction dataframes
NB: the resulting full data sets are provided already, see above - Data Exploration
Run exploration.R to carry out thorough data exploration.
Run environmentaldataplots.R to produce plots exploring environmental variables in relation to skate presence in the haul.
You can clear your workspace and begin the next step separately, using the insights from data exploration to inform the analysis. - Species Distribution Analysis
Run extrainlafun.R to load some additional functions called in the main script.*
Run HighstatLibV13.R to load helper functions from Highland Statistics.Ltd ©, called in the main script.
Run *model_script.R to carry out the analysis for a survey, according to steps outlined in Zuur et al., 2017.
Run meshggplot.R after you have constructed your mesh in the main script (verify the mesh object is in your R space) to create a ggplot version for publications.
Access information
Other publicly accessible locations of the data:
- HH (haul) and HL (species) data for each survey are openly available through the Database of Trawl Surveys (DATRAS) download portal, managed by International Council for Exploration of the Seas (ICES).
- Environmental predictor variables used in the habitat suitability models included: depth (m), distance from coast (m), mean bottom temperature (°C) and mean bottom current velocity (ms-1; Table 2). Depth was obtained from the General Bathymetric Chart of the Oceans, at a resolution of 15 arc-seconds. Monthly averages of bottom temperature, northward current velocity, and eastward current velocity were obtained from the EU Copernicus Marine Service Information as raster files, at a resolution of 0.111° × 0.067°. Since a dataset for bottom current velocity was not available, the velocity at the seafloor was calculated using a bathymetry mask. Distance from coast was calculated for each occurrence point using the ‘Join attributes by nearest’ tool in QGIS. A coastline shapefile was obtained from the European Environment Agency. Fishing pressure data were obtained from Global Fishing Watch and represented "apparent fishing pressure" modelled using Automatic Identification System (AIS) data for commercial fishing vessels, monthly raster values were averaged across the time period.
Flapper skate catch records were obtained from ICES DATRAS (the Database of Trawl Surveys managed by International Council for the Exploration of the Sea). “HH” haul data were extracted for all the surveys covering the NE Atlantic shelf region for the years “2010-2023”, using the “icesDATRAS” package in R. HL “length-based” catch records were also extracted for the flapper skate (species code: 711846) for each survey. Only those surveys with an adequate number of presence records for skate were retained for further analysis, which included the Irish Anglerfish and Megrim Survey (IE-IAMS), Irish Groundfish Survey (IE-IGFS), the North Sea International Bottom Trawl Survey (NS-IBTS), and the Scottish West Coast Groundfish Survey (SCOWCGFS).
Environmental predictor variables used in the habitat suitability models included: depth (m), distance from coast (m), mean bottom temperature (°C), and mean bottom current velocity (ms-1; Table 2). Depth was obtained from the General Bathymetric Chart of the Oceans, at a resolution of 15 arc-seconds. Monthly averages of bottom temperature, northward current velocity, and eastward current velocity were obtained from the EU Copernicus Marine Service Information as raster files, at a resolution of 0.111° × 0.067°. Since a dataset for bottom current velocity was not available, the velocity at the seafloor was calculated using a bathymetry mask. Distance from coast was calculated for each occurrence point using the ‘Join attributes by nearest’ tool in QGIS. A coastline shapefile was obtained from the European Environment Agency.
Bottom temperature and bottom current velocity were extracted for each year covered by the occurrence data, the layers for each variable were then combined into raster stacks. The mean monthly averages of each variable across these time periods were generated by using the ‘calc’ function from the raster package (version 3.1-5, Hijmans, 2021). The averaged environmental data corresponding to each occurrence point were then extracted from the raster layers and joined to create a combined dataset of occurrences and habitat variables. Northward and eastward bottom current velocity were combined to form a combined current speed using the ‘uv2ds’ function from the rWind package (version 1.0.4, Fernández-López & Schliep, 2019). Extractions were carried out using the ‘extract’ function as part of the raster package. Data points that contained null values for environmental variables were dropped.
Predictions of the posterior mean of probability of presence of D. intermedius were generated within the INLA function, by including an empty prediction stack and a data frame containing location points populated with the matching environmental data. The location points were generated by using the ‘Create grid’ tool in QGIS, with a resolution of 10km, using the data from each survey as the grid extent. Environmental data were extracted for these points as before, and null values were removed where the grid overlapped with land.
