# Winter-run Chinook salmon resource selection function 2020

## Citation

Dudley, Peter (2021), Winter-run Chinook salmon resource selection function 2020, Dryad, Dataset, https://doi.org/10.7291/D1SD4D

## Abstract

We use historic aerial redd surveys of Sacramento River winter-run Chinook salmon, coupled with 2D hydraulic modeling, and apply multivariable polynomial, logistic regression to calculate spawning resource selection functions (RSFs) based on water velocity and depth, and examine their interactions with water temperature. Our methods resulted in univariate and multivariate resource selection functions with interactions between both velocity and depth preference with changing temperature. Preferred depth increased and preferred velocity decreased with increasing temperature.

## Methods

**1 Redd location data **

The winter-run redd aerial survey takes place from mid-April to mid-August and averages 12 surveys annually (Appendix S1) (Killam et al. 2014). The survey uses a helicopter to achieve a more accurate count compared to a fixed wing aircraft. A trained observer marked spawning site locations on a map during each flight from 1990 to 2017 (Appendix S2). When clustered near each other, multiple redds are represented as a single point location (hereafter site) feature on the survey maps. We then georeferenced these locations in ArcGIS (ESRI 2015), constructed a point layer by placing a point at the center of each marked spawning site, and assigning the relevant metadata. As the Sacramento River has regulated flows and the aerial surveys occur on a frequent basis (on average every 9.5 days), we assumed detection date conditions were representative of conditions when the spawner chose the site and created the redd. Winter-run redd observations on the Sacramento River extend from Keswick Dam to the C St Bridge (73 river kilometers downstream) near Tehama, California (Fig. 1).

To account for both the error in the recorded location of each redd and the procedure of listing multiple redds at a single site, we created 200 different data sets and allowed the exact locations of the redds to vary between each set. Each set began with the same georeferenced redd point locations. We then constructed a circular buffer around each point of 36.5 m. This estimate of the accuracy comes from errors measured in helicopter surveys of plants which used similar map based recordings of locations (Rebbeck et al. 2015). We increased the buffer radius by a distance equal to the radius of a circle of area equal to the area of a Chinook redd (8.4 m^{2}) (Newton and Brown 2004; Gallagher and Gallagher 2005; Giovannetti and Brown 2008; Riebe et al. 2014) multiplied by the number of redds counted at the current site (Fig. 2). We then randomly distributed a number of points inside each circle equal to the redd count at that site. We used these randomly distributed points as the redd locations for the analysis. We repeated this procedure multiple times to construct 200 datasets.

To create points where redds were never present (absence points), we created a regular grid bounded by the wetted area of the Sacramento River at the typical flow during the winter-run spawning season (300 m^{3}/s flow). We assumed all this area was accessible for spawning however substrate composition may make some areas unusable. We place 80000 uniformly distributed points over this domain resulting in a point spacing of 7.3 m. We removed any of these absence points which were inside any of the circular buffers constructed around any redd site. While the presence points have creation dates, each absence point required a creation date in order to assign a flow (and thus a depth and velocity) and temperature value. We assigned the creation month and day based on a draw from a normal distribution whose mean and standard deviation come from historical redd observation times. We assigned each absence point a year from 1990 to 2017, weighted by the number of redds observed each year. Therefore, absence points were proportional to the redds observed on the river at any given time. This process resulted in a set of 37221 potential absence points. We use a unique random set of these absence points (equal to the number of present points) in each of the 200 datasets (see below).

**2 Modeled redd depth and velocity**

To determine velocity and depth at each point in the data set we constructed a 2D hydraulic model which used 0.5 m resolution sonar derived bathymetry from the U.S. Bureau of Reclamation (Bradley and Greimann 2020). This bathymetry domain extended from Keswick Dam to 0.5 km upstream of the confluence with Cow Creek (Fig. 1). Because the majority of redd sightings (98.8 %) occurred within this domain, we ignored all redds downstream. We used this bathymetry along with the flow model HEC-RAS 2D (Hydrologic Engineering Center 2016) to create velocity and depth rasters for the Sacramento River within this domain. We used HEC-GeoRAS to adjust the bathymetry to account for Anderson-Cottonwood Irrigation District Flashboard Dam in Redding. Our hydraulic model domain consisted of a 20 m resolution grid over the river channel. In computation, HEC-RAS 2D uses both the resolution of the domain grid as well as the resolution of the underlying raster (0.5 m) (high resolution subgrid model (Casulli 2009)). As our initial Manning’s N values, we used values from a previously calibrated 1D HEC-RAS model of the Sacramento River. We ran calibration simulations by adjusting the Manning’s N values until the wetted area of the river matched satellite imagery taken at known flows. We ran the model for values of flow from 50 – 800 m^{3}/s by steps of 50 m^{3}/s and from 800 – 2400 m^{3}/s by steps of 200 m^{3}/s. We chose these values based on historical flow values during the winter-run spawning window (April – July). During this window, 97% of historical flow values were below 800 m^{3}/s, justifying the coarser resolution above 800 m^{3}/s. Each flow simulation resulted in two 0.5 m resolution rasters; one of water depth and one of vertically averaged water velocity. We assumed the minimum reasonable resolution for sampling red site depths and velocities was the diameter of a redd (3.26 m). Thus, for computational efficiency, we used cubic splines to rescale the rasters to 3.26 m resolution.

We used the River Assessment for Forecasting Temperature (RAFT) (Pike et al. 2013; Daniels et al. 2018) to assign flow and temperature values to redd locations based on creation date. RAFT is a one-dimensional model that simulates river temperature and flow in the longitudinal direction. For this analysis, we ran RAFT on a daily time step with a 2 km spatial resolution. When reconstructing historical river temperatures, RAFT assimilates gauge data using a version of the ensemble Kalman filter to improve model performance (Evensen 2009) with root mean square errors often below 0.5 °C (Pike et al. 2013; Daniels et al. 2018). Using the 2 km resolution RAFT data, we assigned each presence and absence point a flow and temperature based on its creation date and river kilometer. We then took the flow assigned to each point and performed a linear interpolation between the rasters created with HEC-RAS (both depth and velocity) to calculate the actual depth and velocity at that point on the creation date. This procedure resulted in a data set with velocity, depth, and temperature at each presence and absence point for their creation date. We removed the small fraction (~1%) of presence points in each data set which were not wetted. Having equal number of presence and absence points can improve model predictive accuracy (Hattab et al. 2013), so in each of the 200 data sets, we used random sampling without replacement to reduce the number of absence points from the over 37000 available to a number equal to the number of presence points.

**3 Statistical models**

We conducted all statistical analysis in a frequentist framework using generalized linear models (GLM) in R (R Core Team 2015). The code used the dplyr and ggplot2 packages and was run using RStudio (RStudio Team 2015; Wickham 2016; Wickham et al. 2018). We conducted a quadratic logistic regression using a Bernoulli link function on velocity and depth separately to get RSFs for each. These models had the form

*Spawn*_{i}* ~ Bernoulli(**p*_{i}*)*

*E**Spawn*_{i}*= **p*_{i}

*logit(**p*_{i}*)= **b*_{0}*+**b*_{1}*x*_{i}*+**b*_{2}*x*_{i}^{2} . eqn1

In this model, *p _{i}* is the probability of use or spawning, the

*b*’s are fitted parameters, and

_{i}*x*represents the value of the variable of interest at the i

_{i}^{th}data point (either velocity or depth). The quadratic form allowed the RSF to be both peaked and asymmetrical. This shape was important because we expected the RSF to have an optimum value about which it need not be symmetrical.

As temperature may interact with depth and velocity in site selection, we examined the effect of temperature with the following model

*Spawn*_{i}* ~ Bernoulli(**p*_{i}*)*

*E**Spawn*_{i}*= **p*_{i}

*logit**p*_{i}*= **b*_{0}*+**b*_{1}*v*_{i}*+**b*_{2}*v*_{i}^{2}*+**b*_{3}*d*_{i}*+**b*_{4}*d*_{i}^{2}*+**b*_{5}*T*_{i}*+**b*_{6}*v*_{i}*d*_{i}*+**b*_{7}*T*_{i}*v*_{i}*+**b*_{8}*d*_{i}*T*_{i}* *

eqn 2

where *T _{i}*,

*v*, and

_{i}*d*represent temperature, velocity, and depth values respectively for the i

_{i}^{th }data point. We then checked to see if either of the temperature interaction terms (b

_{7}and b

_{8}) were non-zero across the 200 data sets. As either velocity or depth may be correlated with temperature in this system, an apparent credible interaction between either velocity or depth and temperature on site selection could arise irrespective of winter-run behavior. To assess this potential, we fit an equation using only depth and velocity

*Spawn*_{i}* ~ Bernoulli(**p*_{i}*)*

*E**Spawn*_{i}*= **p*_{i}

*logit**p*_{i}*= **b*_{0}*+**b*_{1}*v*_{i}*+**b*_{2}*v*_{i}^{2}*+**b*_{3}*d*_{i}*+**b*_{4}*d*_{i}^{2}*+**b*_{5}*v*_{i}*d*_{i}* * . eqn 3

We used this model to construct a simulated data set where selection of redd site location was only based on depth and velocity. We constructed this data set by taking all presence and absence velocity, depth, and temperature data and assigning them simulated presence or absence values probabilistically based on the velocity and depth derived RSF. We conducted the same multivariable analysis with temperature on this simulated data set, and compared the values for the interaction terms (b_{7 }and b_{8}) from the actual and simulated data sets.

After running the models described by equation 3, we ran a version without the interaction terms. We checked the penalized AIC of these eight models to decide on a single model for use in a visualization tool of spawning habitat. To evaluate the predictive power of the selected most parsimonious model (the model with the lowest AIC) we conducted a 4-fold partitioning of each data set by dividing our data sets into four groups, using three to calibrate the model, and using the remaining fourth to test the model. This method resulted in 800 tests of the model from which we constructed receiver operator curves (ROCs) and calculated the area under the curve (AUC) for each (Murtaugh 1996; Fielding and Bell 1997; Boyce et al. 2002).

To check for spatial correlation we examined a variogram of the residuals from our model. We divided up the river into segments to remove spatial correlation and ran a generalized linear mixed model using the lme4 package in R (Bates et al. 2015) with river section as a random effect to compare to our non-spatial model.

Finally, we constructed a Shiny application (Chang et al. 2019) using the mean coefficients of the lowest AIC model to show RSF values for a user selected section of the Sacramento River given a user selected flow and temperature. The flow and temperature are treated as constant over the calculation/display domain. The application also allows users to change between metric and imperial units (to make it accessible to both scientists and U.S. managers); visualize historical redd locations; map depth and velocity rasters; plot histograms of depth, velocity, and RSF values; save images; and download the velocity, depth, and RSF data.

References

Bates, D., Mächler, M., Bolker, B., and Walker, S. 2015. Fitting Linear Mixed-Effects Models Using {lme4}. J. Stat. Softw. **67**(1): 1–48. doi:10.18637/jss.v067.i01.

Boyce, M.S., Vernier, P.R., Nielsen, S.E., and Schmiegelow, F.K.A. 2002. Evaluating resource selection functions. Ecol. Modell. **157**(2–3): 281–300. doi:10.1016/S0304-3800(02)00200-4.

Bradley, D.N., and Greimann, B. 2020. Sacramento River Gravel Augmentation Study.

Casulli, V. 2009. A high-resolution wetting and drying algorithm for free-surface hydrodynamics. Int. J. Numer. Methods Fluids **60**: 391–408.

Chang, W., Cheng, J., Allaire, J., Xie, Y., and McPherson, J. 2019. shiny.

Daniels, M.E., Sridharan, V.K., John, S.N., and Danner, E.M. 2018. NOAA Technical Memorandum NMFS: Calibration and Validation of Linked Water Temperature Models for the Linked Water Temperature Models for the Shasta Reservoir and the Sacramento River from 2000 to 2015. doi:10.7289/V5/TM-SWFSC-597.

ESRI. 2015. ArcGIS Desktop. Environmental Systems Research Institute, Redlands, CA.

Evensen, G. 2009. Data assimilation: the ensemble Kalman filter. *In* 2nd edition. Springer Science & Business Media, Heidelberg.

Fielding, A.H., and Bell, J.F. 1997. A review of methods for the assessment of prediction errors in PB models. Environ. Conserv. **24**(1): 38–49. NOAA Seattle Regional Library. doi:https://doi.org/10.1017/S0376892997000088.

Gallagher, S.P., and Gallagher, C.M. 2005. Discrimination of chinook salmon, coho salmon, and steelhead redds and evaluation of the use of redd data for estimating escapement in several unregulated streams in Northern California. N. Am. J. Fish. Manag. **25**(1): 284–300. doi:10.1577/M04-016.1.

Giovannetti, S., and Brown, M.R. 2008. Adult spring Chinook salmon monitoring in Clear Creek , California : 2007 Annual Report . (September).

Hattab, T., Ben Rais Lasram, F., Albouy, C., Sammari, C., Romdhane, M.S., Cury, P., Leprieur, F., and Le Loc’h, F. 2013. The Use of a Predictive Habitat Model and a Fuzzy Logic Approach for Marine Management and Planning. PLoS One **8**(10).

Hydrologic Engineering Center. 2016. Hydrologic Engineering Center’s River Analysis System. Hydrologic Engineering Center.

Killam, D., Johnson, M., and Revnak, R. 2014. Chinook Salmon Populations of the Upper Sacramento River Basin In 2014.

Murtaugh, P.A. 1996. The Statistical Evaluation of Ecological Indicators. Ecol. Appl. **6**(1): 132–139.

Newton, J.M., and Brown, M.R. 2004. Adult spring Chinook salmon monitoring in Clear Creek , California 1999-2002 . Red Bluff, CA.

Pike, A., Danner, E., Boughton, D., Melton, F., Nemani, R., Rajagopalan, B., and Lindley, S.T. 2013. Forecasting river temperatures in real time using a stochastic dynamics approach. Water Resour. Res. **49**(9): 5168–5182. doi:10.1002/wrcr.20389.

R Core Team. 2015. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available from www.R-project.org.

Rebbeck, J., Kloss, A., Bowden, M., Coon, C., Hutchinson, T.F., Iverson, L., and Guess, G. 2015. Aerial detection of seed-bearing female ailanthus altissima: A cost-effective method to map an invasive tree in forested landscapes. For. Sci. **61**(6): 1068–1078. doi:10.5849/forsci.14-223.

Riebe, C.S., Sklar, L.S., Overstreet, B.T., and Wooster, J.K. 2014. Optimal reproduction in salmon spawning substrates linked to grain size and fish length. Water Resour. Res. **50**: 1–21. doi:10.1002/2012WR013085.Received.

RStudio Team. 2015. RStudio: Integrated Development for R. RStudio, Inc., Boston. Available from http://www.rstudio.com/.

Wickham, H. 2016. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, New York. Available from https://ggplot2.tidyverse.org.

Wickham, H., François, R., Henry, L., and Müller, K. 2018. dplyr: A Grammar of Data Manipulation. Available from https://cran.r-project.org/package=dplyr.

## Funding

Bureau of Reclamation