Recommendations for quantifying and reducing uncertainty in climate projections of species distributions
Cite this dataset
Brodie, Stephanie (2022). Recommendations for quantifying and reducing uncertainty in climate projections of species distributions [Dataset]. Dryad. https://doi.org/10.7291/D1JQ2K
Abstract
Projecting the future distributions of commercially and ecologically important species has become a critical approach for ecosystem managers to strategically anticipate change, but large uncertainties in projections limit climate adaptation planning. Although distribution projections are primarily used to understand the scope of potential change - rather than accurately predict specific outcomes - it is nonetheless essential to understand where and why projections can give implausible results and to identify which processes contribute to uncertainty. Here, we use a series of simulated species distributions, an ensemble of 252 species distribution models, and an ensemble of three regional ocean climate projections, to isolate the influences of uncertainty from earth system model spread and from ecological modeling. The simulations encompass marine species with different functional traits and ecological preferences to more broadly address resource manager and fishery stakeholder needs, and provide a simulated true-state with which to evaluate projections. We present our results relative to the degree of environmental extrapolation from historical conditions, which helps facilitate interpretation by ecological modelers working in diverse systems.
We found uncertainty associated with species distribution models can exceed uncertainty generated from diverging earth system models (up to 70% of total uncertainty by 2100), and that this result was consistent across species traits. Species distribution model uncertainty increased through time and was primarily related to the degree to which models extrapolated into novel environmental conditions but moderated by how well models captured the underlying dynamics driving species distributions. The predictive power of simulated species distribution models remained relatively high in the first 30 years of projections, in alignment with the time period in which stakeholders make strategic decisions based on climate information. By understanding sources of uncertainty, and how they change at different forecast horizons, we provide recommendations for projecting species distribution models under global climate change.
Methods
Summary
We used a combination of regional ocean climate projections and simulated species distributions (Leroy et al., 2016) to quantify sources of uncertainty in projections of spatially-explicit biomass for three species archetypes in the CCS (1985-2100; Fig. 1). Species archetypes were simplified representations of three general groups of marine finfish found in the CCS that comprise ecologically and/or economically important fisheries and that might be expected to show variable patterns of redistribution under climate change based on their habitat preferences, population dynamics, and mobility characteristics: 1) a highly migratory species (HMS) that was designed to resemble north Pacific albacore; 2) a coastal pelagic species (CPS) that was designed to resemble northern anchovy (CPS); and 3) a groundfish species (GFS) that was designed to resemble sablefish. SDMs (n=15; Figure 1) were then fitted to simulated biomass data for each archetype (training period 1985-2010) and projected from 2011-2100 using each of the three regional ocean climate models. Our framework resulted in 252 SDMs (15 SDM types, three species archetypes, three ESMs, and two environmental parameter simulations; Figure 1). To address our study goal of assessing SDM performance and understanding sources of uncertainty in species distribution projections, we compared the output of SDM projections against simulated “observations” for 2011-2100 and quantified the uncertainty introduced by the climate projection (ESM uncertainty) versus the uncertainty introduced by the SDM structure (SDM uncertainty).
Environmental Covariates from Regional Ocean Projections
Environmental covariates used in species distribution simulations were obtained from regional ocean projections (Pozo Buil et al., 2021) forced by three ESMs from phase 5 of the Coupled Model Intercomparison Project (CMIP5) archive: Geophysical Fluid Dynamics Laboratory (GFDL) ESM2M, Hadley Center HadGEM2-ES (HAD), and Institut Pierre Simon Laplace (IPSL) CM5A-MR. These ESMs, hereafter referred to as GFDL, HAD, and IPSL, span the approximate range of potential changes in physical and biogeochemical conditions across all CMIP5 models (Pozo Buil et al., 2021). ESMs were downscaled using the Regional Ocean Modelling System (ROMS) coupled with a biogeochemical model (NEMUCSC) (Fiechter et al., 2018, 2021) based on the North Pacific Ecosystem Model for Understanding Regional Oceanography (NEMURO) (Kishi et al., 2007). The ROMS domain spans the CCS from 30-48°N and from the coast to 134°W at 0.1° horizontal resolution with 42 terrain-following vertical layers (Figure 2). Each downscaled ESM used the Representative Concentration Pathway (RCP) 8.5 climate change scenario. While we only examined RCP 8.5, it should be noted that using RCPs 2.6 and 4.5 would result in only minor differences in the spread of future environmental change for the variables and ESMs examined here. Specifically, uncertainty in biogeochemical change among the chosen ESMs in RCP8.5 envelops the uncertainty among RCPs 2.6 and 4.5; while for temperature GFDL and HAD represent opposite ends of the spectrum for the projected magnitude of warming in the CMIP5 ensemble (Drenkard et al., 2021; Pozo Buil et al., 2021). As such, we do not explore scenario uncertainty. Environmental covariates used in species distribution simulations were sea surface temperature (SST; C), bottom temperature (BT; C), bottom oxygen (BO; mmol m-3), mixed layer depth (MLD; m), surface chlorophyll-a (Chl-a; mg m-3), and zooplankton concentration integrated over 50 m (zoo_50; mmol N m-2) and 200 m (zoo_200; mmol N m-2). These environmental covariates were averaged over spring months (March-May) annually (1985-2100) to encompass the seasonal period when ocean productivity is most influential on the long-term population dynamics of most marine fishes in the CCS.
Operating Models: Simulated Species Biomass
Biomass distributions for three species archetypes were simulated on the ROMS grid for each year and each ESM from 1985-2100. Simulations were run using the ‘virtualspecies’ R package (Leroy et al., 2016) that is specifically designed to reflect real-world ecological properties and species-environment relationships (Meynard et al., 2019). We refer to these simulated species distributions as ‘operating models’. Species simulations used a two-step process. First, habitat suitability was calculated based on environmental data and specified species’ habitat preferences (Table S1). Environmental preferences used to force species distributions varied among species archetypes based on representative life histories (see Supplementary Material). The domain for the HMS archetype was set to the entire CCS, whereas the CPS and GFS archetypes were reduced to inshore waters to reflect the CPS archetype’s preference for pelagic waters over the continental shelf and slope, and the GFS archetype’s preference for demersal shelf and slope habitats (Leeuwis et al., 2019; Stierhoff et al., 2020).
Second, total habitat suitability was calculated, and converted to presence-absence using a logistic function (which specifies at what suitability value the species becomes present). When species were present, biomass was estimated from a log-normal distribution, and when species were absent biomass was set to zero. Biomass at each grid cell was multiplied by habitat suitability of that same grid cell to provide habitat-informed biomass. For CPS and GFS archetypes, an additional biomass multiplier was used to encompass population-level dynamics (Figure S1; see supplementary methods) (Punt et al., 2016). Specifically, CPS biomass was made to reflect boom-bust population dynamics that are common in CPS species in the CCS, while GFS biomass integrated a 20-year phase shift between low and high recruitment, as has been observed for sablefish (Haltuch et al., 2019). Simulated data were generated for each grid cell (HMS = 21912 grid cells; CPS & GFC = 4012 grid cells) once per year for 116 years (1985-2100). Detailed methods for the simulation are provided in the Supplementary material, and R code is provided on GitHub (https://github.com/stephbrodie1/Projecting_SDMs).
Estimation Models: Species Distribution Models
We parameterized a series of SDMs to estimate the relationship between simulated species biomass and covariates (Figure 1). Because these are fitted to data from an operating model, we refer to these SDMs as ‘estimation models’. Multiple approaches were tested to explore how decisions about model type and parameterization influence model accuracy and predictive performance (Brodie et al., 2020). We used four types of SDMs: generalized additive models (GAM), generalized linear mixed models (GLMM), boosted regression trees (BRT), and multilayer perceptron models (MLP; a type of artificial neural network model) (Table S2). Parameterization options included various combinations of environmental (E), spatial (S), and temporal (T) covariates (Figure 1; see supplementary methods). Spatial and temporal covariates can act as proxies for unobserved or unmeasured processes that drive species distributions, and were included here given their common use in SDMs (typically called spatiotemporal models) (Brodie et al., 2020). We expect spatiotemporal SDMs with no environmental covariates to perform poorly over the projection period. We constructed all SDMs as delta (hurdle) models, where the probability of occurrence (binomial) and positive biomass (log-normal) were estimated as separate processes. All SDMs were trained on data from 1985-2010, where only 500 random samples per year (2% of available data) were used for fitting (n=13 000). Random samples included both presence and absences sampled across the entire domain. No SDM validation or model selection was required as our simulation experiment is designed to explore a range of model parameterizations.
Fitted SDMs were then used to predict species biomass on projected environmental data, for every year and grid cell in the domain. Only 500 randomly sampled grid cells per year (2011-2100) were used for testing purposes (n=45 000), to match the resolution of samples used to train models. Importantly, not all environmental covariates used to simulate species biomass (see 2.3 above) were included in the fitted SDMs. Specifically, we used chlorophyll-a as a proxy for prey fields (zooplankton) to approximate real-world conditions where imperfect information is available for estimating species’ habitat preferences. In addition to the 15 SDM parameterizations listed in Figure 1, we examined SDMs that only contained a single covariate of temperature (either surface or bottom temperature depending on the archetype). This experiment was done to test how under-parameterized models that miss key environmental drivers of species distributions performs, and the degree to which this approach decreases model fit and increases projection uncertainty. We refer to these SDMs as ‘temperature-only’ models (Figure 1).
Usage notes
.rds and raster files can be opened in R statistical software.
Funding
National Oceanic and Atmospheric Administration, Award: NA17OAR4310108
National Oceanic and Atmospheric Administration, Award: NA17OAR4310268
National Aeronautics and Space Administration, Award: 80NSSC19K0187