Data integration improves species distribution forecasts under novel ocean conditions
Data files
Aug 12, 2025 version files 5.72 MB
-
ModelPerformanceMetrics.csv
8.59 KB
-
ModelSpatialPredictions.nc
5.69 MB
-
README.md
4.50 KB
-
VariableResponseCurves.csv
14.80 KB
Abstract
Accurate forecasts of species distributions in response to a changing climate are essential for proactive management and conservation decision-making. However, species distribution models (SDMs) often have limited capacity to produce robust forecasts under novel environmental conditions, partly due to limitations in model training data. Model-based approaches that leverage diverse types of data have advanced over the last decade, yet their forecasting skill, especially during episodic climatic events, remains uncertain. Here, we develop a suite of SDMs for a commercially important fishery species, albacore tuna (Thunnus alalunga), to evaluate forecast skill under marine heatwave conditions. We compare models that use different methods to leverage data sources (data pooling vs. joint likelihood) and to address spatial dependence (environmental and spatial effects vs. environmental-only) to assess their relative performance in predicting species distributions under novel environmental conditions. Our results indicate model performance declined across all model types as environmental novelty increased, as expected. However, joint-likelihood approaches were more resilient to novel conditions, demonstrating greater predictive skill and ecological realism than traditional SDMs. These results suggest that ecological forecasts under novel environmental conditions are more skillful with a model framework that accounts for unmeasured spatial and temporal variability and uses model-based data integration to explicitly leverage diverse data types. As access to diverse data sources continues to increase, maximizing their utility will be key for delivering accurate forecasts of species distributions and advancing proactive, climate-ready management and conservation strategies.
This repository contains model outputs to help recreate visualizations and interpret results. Specifically, the repo contains:
- Model performance outputs (i.e., AUC, MAE, Boyce, & Novelty metrics) from retrospective monthly forecast analysis
- Each model's spatial forecast prediction for an example month (September 2015)
- Response curves (i.e., marginal effects) from each model
Note: The data provided in this repository do not contain the raw data, but only model-derived outputs, due to restrictions associated with both datasets used in the study.
Specifically, logbook data for the U.S. albacore troll and pole-and-line fishery are confidential U.S. government data and are not publicly available. The raw data cannot be made public under the Magnuson–Stevens Fishery Conservation and Management Reauthorization Act of 2006, section 402(b), 16 U.S.C. 1881a. To request access to U.S. Highly Migratory Species albacore logbook data, please contact Charles Villafana (Charles.Villafana@noaa.gov). Additionally, the albacore archival tagging raw data are not publicly posted at the request of the American Fishermen’s Research Foundation (AFRF), which collaborated with the National Oceanic and Atmospheric Administration (NOAA) to implement the tagging program. However, these data are freely available for research purposes through AFRF and NOAA. Further information on how to obtain these data can be directed to Barbara Muhling (bmuhling@ucsc.edu).
Description of the data and file structure
ModelPerformanceMetrics.csv — refers to Figures 3 & 4
- Description: A comma-delimited file containing performance metrics for monthly forecasts, summarized by model type.
- Format:
.csv - Size: 8.59 KB
- Dimensions: 121 rows × 8 columns
- Variables:
AUC: Area under the Operating CurveMAE: Mean Absolute ErrorBoyce: Boyce indexHellingers_SST: Hellinger's Distance (degree of novelty) of Sea Surface Temperature (SST) in the forecast monthHellingers_MLD: Hellinger's Distance (degree of novelty) of Mixed Layer Depth (MLD) in the forecast monthmonth_year: Forecast month and year (format: mm/dd/yyyy)model: Model type (HE: Habitat Envelope;GF: Gaussian Field;iSDM: integrated Species Distribution Model)count: Number of data points used for evaluation that month (evenly split between presences and pseudo-absences)
VariableResponseCurves.csv — refers to Figure 5
- Description: A comma-delimited file containing the estimated marginal effects of each environmental variable, including the mean, standard deviation, and 95% credible interval bounds. Results are summarized by model type.
- Format:
.csv - Size: 13.22 KB
- Dimensions: 181 rows × 8 columns
- Variables:
env_value: Value or interval of the environmental variable at which the marginal effect is evaluatedmean: Posterior mean of the marginal effectsd: Posterior standard deviation of the marginal effect0.025quant: Lower bound of the 95% credible interval (2.5th percentile)0.5quant: Posterior median (50th percentile)0.975quant: Upper bound of the 95% credible interval (97.5th percentile)model: Model type (HE: Habitat Envelope;GF: Gaussian Field;iSDM: integrated Species Distribution Model)Variable: Environmental variable (SST: sea surface temperature;MLD: mixed layer depth;Bathymetry)
ModelSpatialPredictions.nc — refers to Figure 6
- Description: A NetCDF file containing spatial predictions from three models stored as stacked layers. Each layer represents predicted habitat suitability over a common spatial grid of the Northeast Pacific Ocean.
- Format:
.nc - Size: 5.69 MB
- Dimensions: 563 rows (latitude) × 840 columns (longitude) × 3 layers (
z) - Coordinate system: WGS84 (
+proj=longlat +datum=WGS84 +no_defs) - Resolution: 0.083 × 0.083
- Extent: 179.9583, 249.9583, 10.04167, 56.95833 (xmin, xmax, ymin, ymax)
- Variable:
variable: Habitat suitability predictions- Units: Probability (0–1)
- Layers (
zdimension):z = 1: Habitat Envelope (HE) modelz = 2: Gaussian Field (GF) modelz = 3: integrated Species Distribution Model (iSDM)
