Data from: Evaluating three modelling frameworks for assessing changes in fin whale distribution in the Mediterranean Sea
Data files
May 26, 2026 version files 10.86 MB
-
df_bp_detection.csv
169.06 KB
-
df_bp_model.csv
10.68 MB
-
README.md
4.47 KB
Abstract
Aim: Understanding the habitat of highly migratory species is aided by using species distribution models to identify species-habitat relationships and to inform conservation and management plans. While Generalized Additive Models (GAMs) are commonly used in ecology, and particularly the habitat modelling of marine mammals, there remains a debate between modelling habitat (presence/absence) versus density (# individuals). Our study assesses the performance and predictive capabilities of GAMs compared to boosted regressions trees (BRTs), for modelling both fin whale density and habitat suitability alongside Hurdle Models treating presence/absence and density as a two-stage process, to address the challenge of zero-inflated data.
Location: Fixed transects crossing the North Western Mediterranean Sea.
Time period: From 2008 to 2022, during the summer period.
Major taxa studied: Fin whale (Balaenoptera physalus)
Methods: Data were analysed using traditional line transect methodology, obtaining the Effective Area monitored. Based on existing literature, we select various covariates, either static in nature, such as bathymetry and slope, or variable in time, e.g., SST, MLD, Chl concentration, EKE, and FSLE. We compared both the explanatory power and predictive skill of the different modelling techniques (GAMs, BRT, and Hurdle Model).
Results: Our results show that all models performed well in distinguishing presences and absences, but while density and presence patterns for the fin whale were similar, their dependencies on environmental factors can vary depending on the chosen model. Bathymetry was the most important variable in all models, followed by SST, and the chlorophyll recorded two months before the sighting.
Main conclusions: This study underscores the role SDMs can play in marine mammal conservation efforts and emphasizes the importance of selecting appropriate modelling techniques. It also quantifies the relationship between environmental variables and fin whale distribution in an understudied area, providing a solid foundation for informed decision-making and spatial management.
Description of the data and file structure
These data were collected as part of the long-term monitoring program of the FLT Med Net project. Fin whale data were obtained during ferry-based surveys conducted in the north-western Mediterranean Sea between 2008 and 2022.
Ferry GPS tracks were used to define transects corresponding to single port-to-port trips. GPS points collected during off-effort periods or under unsuitable sea conditions (sea state > 4 on the Beaufort scale) were excluded.
Survey effort and sightings were spatially aggregated using a 5 km × 5 km grid, generated following the European Environmental Agency (EEA) INSPIRE compliance guidelines. For each grid cell, daily survey effort (kilometres travelled) and daily fin whale abundance were calculated.
The dataset is intended to support analyses of detection probability and habitat suitability and distribution modelling.
Data files
File: df_bp_detection.csv
File description: This file contains data used for detection probability analyses. Each row represents a single fin whale sighting. 'NA' means 'not available': for example, some ferry information were missing.
Variables
- COD_Effort: survey identifier including route information
- Date: date of the survey
- Species: scientific name of the detected species
- Best: best estimate of the number of individuals observed
- Sea.state: sea state according to the Beaufort scale (values 0–4)
- type: ferry category based on command deck height
- Type I: 12–15 m
- Type II: 20–22 m
- Type III: 25–29 m
- method: observation protocol
B: binoculars used continuously during surveysA: binoculars used only for species identification
- distance: perpendicular distance from the transect line
File: df_bp_model.csv
File description: This file contains data used to build all statistical models. Each row represents a daily 5 km grid cell. 'NA' means 'not available': for example, not all the satellite data were available for every cell.
Variables
- COD_Effort: survey identifier including route information
- length: total kilometres travelled within the cell during that day
- sum_best: total number of individuals observed within the cell during that day
- date: date of the survey
- eke: mean eddy kinetic energy in the cell
- sst: mean sea surface temperature in the cell
- mld: mean mixed layer depth in the cell
- chl: mean chlorophyll concentration in the cell
- chl_1lag: mean chlorophyll concentration one month prior to the survey date
- chl_2lag: mean chlorophyll concentration two months prior to the survey date
- fsle: Finite-Size Lyapunov Exponents
- bath_mean: mean bathymetry (m)
- slope_max: maximum seabed slope (degrees)
- presence: indicator of fin whale presence in the cell
1= presence0= absence
- areakm: effective surveyed area (km²), calculated using the effective strip width (ESW) from detection probability analyses
Data processing
- Ferry GPS tracks were segmented into single port-to-port survey trips.
- Data collected during off-effort periods or sea state > 4 were removed.
- Daily survey effort was calculated as kilometres travelled per grid cell.
- Fin whale sightings were aggregated daily within each grid cell.
- Environmental variables were spatially and temporally matched to grid cells.
Methods
Surveys followed standardized ferry-based visual monitoring protocols as described in the associated publication. Detection probability analyses and habitat modelling were conducted using established statistical approaches implemented in R.
Software-specific information
- Primary software: R
- Additional software:
- QGIS (spatial processing)
- Python (data handling and preprocessing)
External data sources
- Oceanographic variables: Copernicus Marine Service
https://data.marine.copernicus.eu/ - Bathymetry: GEBCO
https://www.gebco.net/
Data use and access
The datasets provided here are processed data supporting the analyses presented in the associated study. Raw survey data are available from the corresponding author upon reasonable request.
