Maxent species distribution modelling of 10 cetacean species in the northeastern Pacific from citizen science occurrence records
Data files
Aug 18, 2025 version files 11.87 MB
-
owsn_data_final_generalized_es.csv
8.66 MB
-
owsn_enmeval.R
24.98 KB
-
OWSN_SDM_ensemble_rasters.zip
3.18 MB
-
README.md
8.10 KB
Abstract
Aim: Citizen science is an important source of biodiversity information, particularly for gathering information on species distributions over time. However, there are challenges with spatial and species biases, and variation in effort in citizen science data. We aimed to investigate seasonal habitat suitability for cetacean species reported within the Pacific Northwest by applying species distribution models (SDMs) to opportunistic sightings data submitted to the Ocean Wise Sightings Network (hereafter OWSN; formerly the British Columbia Cetacean Sightings Network, BCCSN) over two decades (2002-2022).
Location: British Columbia, Washington State, South Alaska.
Taxon: Order Cetacea
Methods: We employed MaxEnt SDMs for the 10 cetacean species most frequently reported to the OWSN between 2002 and 2022. We thinned the dataset to account for spatial bias in sighting locations, then best-performing models were selected based on continuous Boyce Index (CBI), and further evaluated against null models. Ensemble predictions were made using best-performing models on seasonal means of environmental variables across the study period to produce coast-wide maps of relative habitat suitability for each species.
Results: Across all 10 species, SDMs closely reflected the known seasonal species distribution across the Pacific Northwest. Summer habitat hotspots across all species include: the continental shelf offshore of Vancouver Island and Haida Gwaii and following the deep canyons of Queen Charlotte Sound; and winter hotspots encompassing nearshore waters within British Columbia and Washington, as well as much of Hecate Strait in the north and southern parts of Queen Charlotte Sound.
Main conclusions: Citizen science is an efficient mechanism for generating data on cetacean seasonal occurrence. Through applying SDMs and accounting for spatial biases in sampling, opportunistic data can be applied to investigate long-term trends in cetacean distribution, especially concerning the impacts of anthropogenic-mediated pressures such as climate change.
https://doi.org/10.5061/dryad.sn02v6xfv
Description of the data and file structure
This dataset contains data and code required to reproduce seasonal Maxent species distribution models for 10 cetacean species in the northeastern Pacific, following analyses in Dares and Robinson (2025; https://doi.org/10.1111/jbi.15164). Occurrence records represent opportunistic sightings of Dall’s porpoise (Phocoenoides dalli), fin whale (Balaenoptera physalus), grey whale (Eschrichtius robustus), humpback whale (Megaptera novaeangliae), killer whale (Orcinus orca; all northeastern Pacific ecotypes), harbour porpoise (Phocoena phocoena), Minke whale (Balaenoptera acutorostrata), Pacific white-sided dolphin (Lagenorhynchus obliquidens), and sperm whale (Physeter macrocephalus) reported by citizen scientists to the Ocean Wise Sightings Network between northern B.C. and Washington State from 2002-2022.
Files and variables
Contributors
Lauren E. Dares, Ocean Wise Conservation Association - Whales Initiative, Vancouver, Canada (*Now at Fisheries and Oceans Canada, Winnipeg, Canada) - lauren.dares@gmail.com
Chloe V. Robinson, Ocean Wise Conservation Association - Whales Initiative, Vancouver, Canada - chloe.robinson@ocean.org
Ocean Wise Sightings Network - Whales Initiative, Vancouver, Canada - sightings@ocean.org
Dataset Overview
This dataset contains data and code required to conduct species distribution models using the Maxent algorithm for 10 cetacean species in the northeastern Pacific, as described in Dares and Robinson (2025).
The sightings dataset (owsn_data_final_generalized.csv) comprises a subset of opportunistic sightings of Dall’s porpoise (Phocoenoides dalli), fin whale (Balaenoptera physalus), grey whale (Eschrichtius robustus), humpback whale (Megaptera novaeangliae), killer whale (Orcinus orca; all northeastern Pacific ecotypes), harbour porpoise (Phocoena phocoena), Minke whale (Balaenoptera acutorostrata), Pacific white-sided dolphin (Lagenorhynchus obliquidens), and sperm whale (Physeter macrocephalus) reported by citizen scientists to the Ocean Wise Sightings Network (OWSN) between northern B.C. and Washington State from 2002-2022. The occurrence records contained in this dataset were first filtered for high confidence in species identification (= “certain”), divided into summer (May-September) and winter (October-April) seasons. Opportunistic sightings are spatially biased towards areas frequented by citizen scientists, so occurrence records for each species were spatially filtered to remove sightings within a 10 km radius using the spThin package (v.0.2.0) in the R statistical software (v.4.0.4). Background data for MaxEnt modelling of each species were selected by collating occurrences of non-target species and re-running the spThin algorithm to ensure background data were subject to the same spatial bias as presences. Variability among thinned datasets was addressed by running one iteration of model fitting for each of five randomly selected thinned subsets, so this dataset contains all thinned presences across these subsets, along with their associated background data, denoted by the "target_species" and "subset" fields. Please note that latitude and longitude coordinates have been rounded to the nearest 0.01 decimal degrees for all species, and to the nearest 1 degree for species listed under Canada's Species At Risk Act (SARA; grey whale, fin whale, humpback whale, harbour porpoise, and killer whale).
Environmental layers used in SDMs in Dares and Robinson (2025) included seasonal means of remotely sensed sea surface temperature and chlorophyll-A from MODIS Aqua satellites at 4km x 4 km spatial resolution, and depth data from the GEBCO 2021 grid (10.5285/c6612cbe-50b3-0cff-e053-6c86abc09f8f), resampled to 4 km x 4 km. Raw monthly data were obtained from NASA’s Ocean Biology Processing Group (oceancolor.gsfc.nasa.gov) and summarized for summer and winter seasons across the study period. Code to derive these layers for use in SDMs is available at github.com/ldares/owsn-sdm.
Code to run Maxent species distribution models is contained in owsn_enmeval.R. SDMs were constructed in R using the Maxent algorithm contained in the dismo package (v.1.3-9), implemented in the ENMeval package (v.2.0.4) to compare model performance across feature classes and tuning arguments using the Continuous Boyce Index (CBI) as a performance metric. Cross-validation was performed using a hierarchical checkerboard pattern to partition occurrences into four groups. Models across five thinned subset for each species with the highest CBI which outperformed null models and did not exhibit significant spatial autocorrelation were combined as mean ensembles for each species to visualize relative habitat suitability across the northeastern Pacific in each season. Ensemble maps for all species in each season were combined into cumulative habitat suitability layers to evaluate coast-wide suitability for all cetacean species included in the study, and can be found in OWSN_SDM_ensemble_rasters.zip.
File: owsn_data_final_generalized_es.csv
Description: Comma-separated value table of cetacean occurrence records. Note the dataset includes five thinned subsets of occurrences for each species, along with accompanying background data derived from occurrences of non-target species used in each Maxent iteration.
Variables
- Longitude: Decimal degrees, WGS84. Rounded to nearest degree for species listed under Canada's Species at Risk Act, or 0.01 degrees for other species
- Latitude: Decimal degrees, WGS84. Rounded to nearest degree for species listed under Canada's Species at Risk Act, or 0.01 degrees for other species
- sightingsid: OWSN Database identifier for occurrence record
- sightingdate: Date sighting was made by citizen scientist
- species: Common name of cetacean species sighted
- confidence: Citizen scientist confidence in their species identification
- month: month sighting was made
- year: year sighting was made
- season: season sighting was made, either summer (May-September) or winter (October-April)
- target_sp: Target species for which models will be fit using occurrence records as presences and non-target species occurrences as background data.
- dataset: Denotes which thinned dataset for each season was randomly chosen for the target_sp
- pres: Whether the record should be considered a presence (=1) or pseudoabsence/background (=0) in Maxent models
- subset: Denotes different thinned subsets used in separate models to investigate variability in model output among subsets
Missing values are denoted with blank cells.
File: owsn_enmeval.R
Description: R script necessary to reproduce ENMeval outputs for model evaluation and to generate seasonal ensemble predictions and cumulative suitability maps from best-performing Maxent models for each species.
File: OWSN_SDM_ensemble_rasters.zip
Description: Zipped file containing ensemble predictions of suitable habitat for each cetacean species in summer (May-September) and winter (October-April) seasons, and cumulative habitat suitability layers for all species. Layers are in raster (.tiff) format.
Code/software
Data analyses were completed in R v.4.0.4.
Required packages:
tidyverse v.2.0.0
raster v.3.6-20
dismo v. 1.3-9
ENMeval v.2.0.4
spdep v. 1.2-8
Access information
Data was derived from the following sources:
- Ocean Wise Sightings Network database of cetacean occurrence records - data available upon request (contact sightings@ocean.org)
Additional code used to derive the datasets contained in this repository is available from github.com/ldares/owsn-sdm
Opportunistic sightings of Dall’s porpoise (Phocoenoides dalli), fin whale (Balaenoptera physalus), grey whale (Eschrichtius robustus), humpback whale (Megaptera novaeangliae), killer whale (Orcinus orca; all northeastern Pacific ecotypes), harbour porpoise (Phocoena phocoena), Minke whale (Balaenoptera acutorostrata), Pacific white-sided dolphin (Lagenorhynchus obliquidens), and sperm whale (Physeter macrocephalus) reported by citizen scientists to the Ocean Wise Sightings Network between northern B.C. and Washington State from 2002-2022 were obtained for species distribution modelling. Occurrence records were first filtered for confidence in species identification (confidence = "certain"), divided into summer (May-September) and winter (October-April) seasons, then spatially filtered using the spThin package (v.0.2.0) in R (v.4.0.4) to remove clusters of sightings within a 10 km radius. Background data for MaxEnt modelling of each species were selected by collating occurrences of non-target species and re-running the spThin algorithm to ensure background data were subject to the same spatial bias as presences. Variability among thinned datasets was addressed by running one iteration of model fitting for each of five randomly selected thinned subsets, so this dataset contains all thinned presences across these subsets, along with their associated background data, denoted by the "target_species" and "subset" fields.
