Skip to main content

Consistent concentrations of critically endangered Balearic shearwaters in UK waters revealed by at-sea surveys

Cite this dataset

Phillips, Jessica Ann et al. (2021). Consistent concentrations of critically endangered Balearic shearwaters in UK waters revealed by at-sea surveys [Dataset]. Dryad.


Aim: Europe’s only globally critically endangered seabird, the Balearic shearwater (Puffinus mauretanicus), is thought to have expanded its post-breeding range northwards into UK waters, though its distribution there is not yet well understood. This study aims to identify environmental factors associated with the species’ presence, and map the probability of presence of the species across the western English Channel and southern Celtic Sea, and estimate the number of individuals in this area.

Location: The western English Channel and southern Celtic Sea

Methods: This study analyses strip transect data collected from vessel-based surveys in the western English Channel and southern Celtic Sea during the shearwater’s post-breeding period between 2013 and 2017. Using environmental data collected directly and from remote sensors both Generalized Additive Models (GAMs) and the Random Forest (RF) machine learning model were used to determine shearwater presence at different locations.

Results: Both models indicated that oceanographic features were better predictors of shearwater presence than fish abundance. Seafloor aspect, sea surface temperature, depth, salinity, and maximum current speed were the most important predictors. Based on the timing of the surveys (mainly in October) it is probable that most of the sighted shearwaters were immatures.

Main conclusions: Areas with consistently high probabilities of shearwater presence were identified at the Celtic Sea front. Our estimates suggest that the study area in southwest Britain supports between 2% and 23% of the global population of Balearic shearwaters. This study provides the most complete understanding of Balearic shearwater distribution in UK waters available to date, information that will help inform any future UK conservation actions concerning this endangered species.

Usage notes


maximum_current_speed_Feb5_2020.rds is an ASCII of simulated maximum depth-averaged horizontal tidal currents (m s-1) during April 2016. Because of the persistence and predictability of tidal currents, these values are representative of those encountered in the study area during the surveys. Values were sourced from an existing Finite Volume Community Ocean Model (FVCOM) for western UK (Cazenave, Torres et al. 2016) and resampled at ~1km resolution using bilinear interpolation. Coordinates are decimal degrees (WGS84).  Values were constrained to <=3m/s to prevent unrealistic predictions from GAMS/Random Forest models, although very few locations have values > 3 m/s in the study area. 

stratification_index_Feb5_2020.rds is an ASCII of the Hunter-Simpson Stratification Index (Simpson and Hunter 1974) obtained using the equation log10 (h/u3), where h is the water depth (m) and u is the maximum depth-averaged current speed (m/s). Values of the former are provided from existing FVCOM for western UK (see above) whereas seabed depth was sourced from EMODNet. To provide values at 1km resolution, current speed was resampled at ~ 1km resolution using bilinear interpolation, whereas depth was already provided at ~ 1km resolution. Coordinates are decimal degrees (WGS84).  Values < 1.9 and > 1.9 indicate water columns likely to remain mixed and become stratified in summer months, respectively. Values of ~ 1.9 indicate probable locations of tidal fronts between these water masses.

R code

step1 data preparation June27_2019.R reads Balearic Shearwater sightings files, and calculates the GPS coordinates of all Balearic shearwaters sighted.

step2  data preparation continued June3.Rmd cuts sightings to only include those during ‘on effort’ intervals of bird observations, it also removes sightings recorded where the boat was traveling above the acceptable speed range.

step3 create dataset for prediction.Rmd creates a raster stack of all environmental and fish variables to be used for the predictive and explanatory Generalized Additive Models (GAMs) and Random Forest (RF) models.

GAM_1 explanatory GAM Aug26 2020.R identifies the combination of variable to retain which yields the best explanatory GAM, i.e. the explanatory GAM with the lowest Akaike Information Criterion (AIC).  

GAM_2 predictive GAM June3 2020.R identifies the combination of variable to retain which yields the best predictive GAM, i.e. the predictive GAM with the lowest AIC.  

RF_1 explanatory RF June4 2020.Rmd creates an explanatory RF model, and ranks the variables by their importance.

RF_2 predictive RF Aug7 2020.Rmd creates a predictive RF model using the SuperLearner package.


Cazenave, P. W., R. Torres and J. I. Allen (2016). "Unstructured grid modelling of offshore wind farm impacts on seasonally stratified shelf seas." Progress in Oceanography 145: 25-41.

Simpson, J. H. and J. R. Hunter (1974). "Fronts in the Irish Sea." Nature 250(5465): 404-406.


Rhodes Trust

Natural Sciences and Engineering Research Council