Spatial survival analysis accounts for femalebiased breeding dispersal and provides realistic estimates of true annual survival in migratory warblers
Data files
Aug 17, 2024 version files 167.11 KB

HoodedWarblerLocationDispersalData.csv

README.md

RFemaleData.csv

RMaleData.csv
Abstract
Breeding dispersal — betweenseason change in breeding location — is usually femalebiased in birds and creates problems in accurately estimating annual survival, as conventional CormackJollySeber (CJS) survival models cannot discriminate between mortality and undetected emigration. Recently, spatial CJS (sCJS) models have been developed that use data on breeding dispersal within a population to account for undetected emigration and provide corrected estimates of true annual survival, a development that promises to advance avian conservation initiatives that require accurate estimates of annual survival. Using a 14year dataset on a colorbanded population of Hooded Warblers (Setophaga citrina) in northwest Pennsylvania, I examined femalebiased breeding dispersal and performance of a sCJS model in estimating true annual survival of females and males. I also compared my findings to published literature on other migratory North American warblers, a group with many species of high conservation concern. Breeding dispersal in the Hooded Warbler study population is strongly femalebiased, with median dispersal distances of 151 m for females (n = 227) and 51 m (n = 336) for males. Although most individuals disperse short distances, the observed pattern of breeding dispersal within the study site was best modeled using a heavytailed Cauchy dispersal kernel, a model that indicates the presence of a substantial tail of undetected longdistance dispersal, particularly in females. Using the Cauchy model, sCJS analysis yielded realistic estimates of Hooded Warbler true annual survival, 0.61 for both sexes, and resolved ambiguities evident in much lower estimates of apparent annual survival, 0.45 for females and 0.54 for males, derived from conventional CJS analysis. Because longdistance breeding dispersal is widespread in migratory warblers and especially frequent in females, analyses of warbler survival should employ sCJS methods whenever possible, as estimates of apparent annual survival derived from conventional CJS methods will be poor underapproximations of true annual survival.
README: Spatial survival analysis accounts for femalebiased breeding dispersal and provides realistic estimates of true annual survival in migratory warblers
https://doi.org/10.5061/dryad.8w9ghx3wg
Description of the data and file structure
Field Methods, Mapping, and Breeding Dispersal Distance
The study was conducted at Hemlock Hill Field Station in northwest Pennsylvania, USA (41.8°N, 79.9°W). Hemlock Hill consists of 173 ha of primarily beechmaplehemlock forest located within a rural landscape of forest fragments, agricultural land, and abandoned fields; it has been the site of a longterm field study of Hooded Warblers since 2010 (Mumme 2018, Lignac and Mumme 2023, Mumme 2023). Intensive efforts are made each year to capture and color band all resident breeding adults and to census the entire study population. Data for the current study come from 340 females and 333 males that were colorbanded on the study site 2010–2022 and resighted or recaptured there 2011–2023.
Each year I used GPS to map the location, recorded as decimal latitude and longitude, of each resident breeding adult, based on the mean of its nest locations and the location that I judged to be the approximate center of its breeding territory based on resighting locations; including the approximate center of the breeding territory was necessary for mapping the locations of unmated males and a small number of mated pairs for which no nests were found. Seven females changed mates and breeding territories in the middle of a given nesting season, and for these females I mapped their location using their initial breeding territory and nests initiated with their original mate. Total sample size was 1236 locationyears, 567 for females and 669 for males (Figure 1).
For colorbanded breeding females and males that returned to the study site in a subsequent year, I calculated their breeding dispersal distance as the distance in meters between their locations in year x and year x + 1 (214 females, 326 males), or in a few cases where individuals were absent 1–2 years before returning, year x *+ 2 (12 females, 9 males) or year x + 3 (1 female, 1 male); inclusion of individuals absent 1–2 years is unlikely to alter the results, as their breeding dispersal distances did not differ from those of birds returning in consecutive years, for both females (*t = 0.3, df = 225, P = 0.73) and males (t = 1.5, df = 334, P = 0.13). The total sample size available for analysis was 227 breeding dispersal events for females and 336 for males. I tested for sex differences in breeding dispersal distance using a general linear mixed model (GLMM) with an identity link function in which sex was considered a fixed effect and an individual’s unique identification code was entered as a random effect to account for multiple measures of breeding dispersal distance obtained from the same individual. To normalize the data, breeding dispersal distances were logtransformed prior to analysis. Statistical analysis was conducted using JMP Pro 17.0 (SAS Institute, Cary, North Carolina, USA).
Identification of Appropriate Dispersal Model
I used the simulation capabilities of the spatial CJS (sCJS) model of Schaub and Royle (2014) to identify the most appropriate dispersal kernel with which to model Hooded Warbler breeding dispersal. Schaub and Royle (2014) model dispersal as change in location along both the x and y axes in a twodimensional equidistant grid. For clarity, I refer to these changes as xy dispersal distances to distinguish them from breeding dispersal distance described in the previous section; while breeding dispersal distances are always positive, xy dispersal distances can include negative values, reflecting dispersal movements that result in decreases in either longitude (x), latitude (y), or both. Under most circumstances, the distributions of xy dispersal distance along each axis would be expected to be symmetrical and centered around 0, and can be modeled using distributions with these characteristics (Schaub and Royle 2014).
I first represented the 173 ha of the study area within a 20 x 18 equidistant xy grid where each grid unit = 100 m. I then replotted the 1236 bird locationyears according to this grid system (Figure 1) so that the 563 observed breeding dispersal distances (227 for females, 336 for males) could also be expressed as xy dispersal distances. I then used the simulation capabilities of the model of Schaub and Royle (2014) to simulate dispersal within and outside of this gridded study area (Figure 1) using three different theoretical dispersal kernels: (1) a random normal distribution, (2) a t distribution with df = 2, and (3) a Cauchy distribution. The three models can also be viewed as three different t distributions with decreasing degrees of freedom and increasingly heavy tails, as the normal and Cauchy distributions are equivalent to t distributions with df = ∞ and df = 1, respectively. Although the model of Schaub and Royle (2014) allows dispersal kernels on the two axes to have different values of the scale parameter σ (σ1 and σ2 for the x and y axes, respectively), the observed distribution of x and y dispersal distances within the Hooded Warbler study area did not differ significantly for either females (KolmogorovSmirnov D = 0.070, P = 0.63) or males (KolmogorovSmirnov D = 0.065, P = 0.47). I therefore set σ1 = σ2 = σ in all dispersal simulations.
Using the three dispersal kernel models, I then generated simulated xy dispersal data that could be compared to observed xy dispersal data for each sex, searching for the combination of kernel model and σ that would best fit the observed xy dispersal data, as determined by the KolmogorovSmirnov twosample test. Although each simulation was designed to generate about 6,600 total simulated dispersal events, the number of simulated xy dispersal events that occurred within the study area — the ones that could be compared to observed values — varied with both dispersal kernel and σ. The simulations (n = 56) focused on regions of σspace that minimized the KolmogorovSmirnov D statistic, and maximized the corresponding Pvalue, for each of the three tested dispersal kernels.
Estimates of Detection Probability and Annual Survival
I estimated detection probability (p) and apparent annual survival (ɸ) via conventional CJS analysis using R version 4.3.2 (R Core Team 2023) and package marked version 1.28 (Laake et al. 2013). For this analysis, histories of individual females and males were combined into a single dataset and analyzed with a 4parameter model with separate estimates of p and ɸ for each sex. I then used the Bayesian spatial CJS (sCJS) model of Schaub and Royle (2014) to estimate the same parameters correcting for undetected emigration. The sCJS model was run using JAGS version 4.3.2 (Plummer 2003) and the R package r2jags 0.71 (Su and Yajima 2021). For the sCJS analysis I used the dispersal kernel model identified in the previous section and ran the sexes separately. Vague priors (random uniform distribution between 0 and 1) were used for p and ɸ. Because preliminary models encountered convergence issues in estimating the critical dispersal scale parameters σ1 and σ2, particularly in females, more specific priors — random uniform distribution between 0.001 and 2 — were employed for σ1 and σ2 in the final models; these more specific priors are justified, as the simulations described in the previous section indicated that their posterior estimated values would likely fall comfortably within that range. The Markov chain Monte Carlo (MCMC) settings specified three chains, each with 420,000 iterations with the first 20,000 discarded, for a total of 1.2 million saved iterations. No convergence issues were encountered in the final models, with all Rhat values less than 1.02 for females and 1.01 for males.
Files and variables
File: HoodedWarblerLocationDispersalData.csv
Description: This file has 1236 records and 14 columns, with missing values indicated by blank cells.
Variables
 HOWA_ID: Unique identifier for each individual bird.
 Sex: F for female, M for male.
 Year: Specific breeding season for this particular record.
 Lat(yearx): Latitude for the mean location of this bird for the specified year.
 Long(yearx): Longitude for the mean location of this bird for the specified year.
 Lat(x+1): Latitude for the mean location of this bird if it was present in the study area in the following year (or in a few cases if it was present 2 or 3 years later).
 Long(x+1): Longitude for the mean location of this bird if it was present in the study area in the following year (or in a few cases if it was present 2 or 3 years later).
 Distance: Distance in meters from the bird’s location in the specified year and the subsequent year.
 xGridCoordinate: x grid coordinate (see figure 1 of the published paper) for the bird’s mean location in the specified year. Each grid unit = 100m.
 yGridCoordinate: y grid coordinate (see figure 1 of the published paper) for the bird’s mean location in the specified year. Each grid unit = 100m.
 xCoorNextYear: x grid coordinate for the mean location of this bird the next year it was present in the study area.
 yCoorNextYear: y grid coordinate for the mean location of this bird the next year it was present in the study area.
 xChange: Calculated as (xCoorNextYear  xGridCoordinate). This is the x dispersal distance, in grid coordinate units; multiply by 100 for x dispersal distance in meters.
 yChange: Calculated as (yCoorNextYear  yGridCoordinate). This is the y dispersal distance, in grid coordinate units; multiply by 100 for y dispersal distance in meters.
File: RFemaleData.csv
Description: This file has 340 records and 43 columns, with missing values indicated by blank cells. It provides the complete histories and locations of 340 individual females within the study area. The data in this file constitute the primary data inputs for the sCJS analysis of female survival.
Variables
Y.in.1  Y.in.14 (columns AN): These columns show the history of 340 individual females by year, with Y.in.1 representing 2010 and Y.in.14 representing 2023. Values of 1 indicate that the female was present in the study area that year, and values of 0 indicate that the female was absent or undetected that year.
G.in.1  G.in.14 (columns OAB): These columns indicate the x grid coordinate of the female in the designated year (G.in.1 = 2010, and G.in.14 =2023).
G.in.15  G.in.28 (columns ACAP): These columns indicate the y grid coordinate of the female in the designated year (G.in.15 = 2010, and G.in.28 =2023).
F (column AQ): This column indicates the year (occasion) in which the female was initially captured and colorbanded, with 1 = 2010 and 13 = 2022.
File: RMaleData.csv
Description: This file has 333 records and 43 columns, with missing values indicated by blank cells. It provides the complete histories and locations of 333 individual males within the study area. The data in this file constitute the primary data inputs for the sCJS analysis of male survival.
Variables
This file has the same variables and format as the file RFemaleData.csv described above.
Code/software
RSimulationCode.txt
I used the R code in the file RSimulationCode.txt, modified from the simulation capabilities of the spatial CJS (sCJS) model of Schaub and Royle (2014), to identify the most appropriate dispersal kernel with which to model Hooded Warbler breeding dispersal. Schaub and Royle (2014) model dispersal as change in location along both the x and y axes in a twodimensional equidistant grid. For clarity, I refer to these changes as xy dispersal distances to distinguish them from breeding dispersal distance described in the previous section; while breeding dispersal distances are always positive, xy dispersal distances can include negative values, reflecting dispersal movements that result in decreases in either longitude (x), latitude (y), or both. Under most circumstances, the distributions of xy dispersal distance along each axis would be expected to be symmetrical and centered around 0, and can be modeled using distributions with these characteristics (Schaub and Royle 2014).
I first represented the 173 ha of the study area within a 20 x 18 equidistant xy grid where each grid unit = 100 m. I then replotted the 1236 bird locationyears according to this grid system (Figure 1) so that the 563 observed breeding dispersal distances (227 for females, 336 for males) could also be expressed as xy dispersal distances. I then used the simulation capabilities of the model of Schaub and Royle (2014) to simulate dispersal within and outside of this gridded study area (Figure 1) using three different theoretical dispersal kernels: (1) a random normal distribution, (2) a t distribution with df = 2, and (3) a Cauchy distribution. The three models can also be viewed as three different t distributions with decreasing degrees of freedom and increasingly heavy tails, as the normal and Cauchy distributions are equivalent to t *distributions with df = ∞ and df = 1, respectively. Although the model of Schaub and Royle (2014) allows dispersal kernels on the two axes to have different values of the scale parameter σ (σ1 and σ2 for the *x and y axes, respectively), the observed distribution of x and y dispersal distances within the Hooded Warbler study area did not differ significantly for either females (KolmogorovSmirnov D = 0.070, P = 0.63) or males (KolmogorovSmirnov D = 0.065, P = 0.47). I therefore set σ1 = σ2 = σ in all dispersal simulations.
Using the three dispersal kernel models, I then generated simulated xy dispersal data that could be compared to observed* xy* dispersal data for each sex, searching for the combination of kernel model and σ that would best fit the observed xy dispersal data, as determined by the KolmogorovSmirnov twosample test. Although each simulation was designed to generate about 6,600 total simulated dispersal events, the number of simulated xy dispersal events that occurred within the study area — the ones that could be compared to observed values — varied with both dispersal kernel and σ. The simulations (n = 56) focused on regions of σspace that minimized the KolmogorovSmirnov D statistic, and maximized the corresponding Pvalue, for each of the three tested dispersal kernels.
RAnalysisCode.txt
I used the R code in the file RAnalysisCode.txt, modified from Schaub and Royle (2014), to do the spatial CJS analysis of male and female survival. I first estimated detection probability (p) and apparent annual survival (ɸ) via conventional CJS analysis using R version 4.3.2 (R Core Team 2023) and package marked version 1.28 (Laake et al. 2013). For this analysis, histories of individual females and males were combined into a single dataset and analyzed with a 4parameter model with separate estimates of p and ɸ for each sex. I then used the Bayesian spatial CJS (sCJS) model specified in RAnalysisCode.txt to estimate the same parameters correcting for undetected emigration. The sCJS model was run using JAGS version 4.3.2 (Plummer 2003) and the R package r2jags 0.71 (Su and Yajima 2021). For the sCJS analysis I used the Cauchy dispersal kernel model identified in the previous section and ran the sexes separately. Vague priors (random uniform distribution between 0 and 1) were used for p and ɸ. Because preliminary models encountered convergence issues in estimating the critical dispersal scale parameters σ1 and σ2, particularly in females, more specific priors — random uniform distribution between 0.001 and 2 — were employed for σ1 and σ2 in the final models; these more specific priors are justified, as the simulations described in the previous section indicated that their posterior estimated values would likely fall comfortably within that range. The Markov chain Monte Carlo (MCMC) settings specified three chains, each with 420,000 iterations with the first 20,000 discarded, for a total of 1.2 million saved iterations. No convergence issues were encountered in the final models, with all Rhat values less than 1.02 for females and 1.01 for males.
Methods
Field Methods, Mapping, and Breeding Dispersal Distance
The study was conducted at Hemlock Hill Field Station in northwest Pennsylvania, USA (41.8°N, 79.9°W). Hemlock Hill consists of 173 ha of primarily beechmaplehemlock forest located within a rural landscape of forest fragments, agricultural land, and abandoned fields; it has been the site of a longterm field study of Hooded Warblers since 2010 (Mumme 2018, Lignac and Mumme 2023, Mumme 2023). Intensive efforts are made each year to capture and color band all resident breeding adults and to census the entire study population. Data for the current study come from 340 females and 333 males that were colorbanded on the study site 2010–2022 and resighted or recaptured there 2011–2023.
Each year I used GPS to map the location, recorded as decimal latitude and longitude, of each resident breeding adult, based on the mean of its nest locations and the location that I judged to be the approximate center of its breeding territory based on resighting locations; including the approximate center of the breeding territory was necessary for mapping the locations of unmated males and a small number of mated pairs for which no nests were found. Seven females changed mates and breeding territories in the middle of a given nesting season, and for these females I mapped their location using their initial breeding territory and nests initiated with their original mate. Total sample size was 1236 locationyears, 567 for females and 669 for males (Figure 1).
For colorbanded breeding females and males that returned to the study site in a subsequent year, I calculated their breeding dispersal distance as the distance in meters between their locations in year x and year x + 1 (214 females, 326 males), or in a few cases where individuals were absent 1–2 years before returning, year x + 2 (12 females, 9 males) or year x + 3 (1 female, 1 male); inclusion of individuals absent 1–2 years is unlikely to alter the results, as their breeding dispersal distances did not differ from those of birds returning in consecutive years, for both females (t = 0.3, df = 225, P = 0.73) and males (t = 1.5, df = 334, P = 0.13). The total sample size available for analysis was 227 breeding dispersal events for females and 336 for males. I tested for sex differences in breeding dispersal distance using a general linear mixed model (GLMM) with an identity link function in which sex was considered a fixed effect and an individual’s unique identification code was entered as a random effect to account for multiple measures of breeding dispersal distance obtained from the same individual. To normalize the data, breeding dispersal distances were logtransformed prior to analysis. Statistical analysis was conducted using JMP Pro 17.0 (SAS Institute, Cary, North Carolina, USA).
Identification of Appropriate Dispersal Model
I used the simulation capabilities of the spatial CJS (sCJS) model of Schaub and Royle (2014) to identify the most appropriate dispersal kernel with which to model Hooded Warbler breeding dispersal. Schaub and Royle (2014) model dispersal as change in location along both the x and y axes in a twodimensional equidistant grid. For clarity, I refer to these changes as xy dispersal distances to distinguish them from breeding dispersal distance described in the previous section; while breeding dispersal distances are always positive, xy dispersal distances can include negative values, reflecting dispersal movements that result in decreases in either longitude (x), latitude (y), or both. Under most circumstances, the distributions of xy dispersal distance along each axis would be expected to be symmetrical and centered around 0, and can be modeled using distributions with these characteristics (Schaub and Royle 2014).
I first represented the 173 ha of the study area within a 20 x 18 equidistant xy grid where each grid unit = 100 m. I then replotted the 1236 bird locationyears according to this grid system (Figure 1) so that the 563 observed breeding dispersal distances (227 for females, 336 for males) could also be expressed as xy dispersal distances. I then used the simulation capabilities of the model of Schaub and Royle (2014) to simulate dispersal within and outside of this gridded study area (Figure 1) using three different theoretical dispersal kernels: (1) a random normal distribution, (2) a t distribution with df = 2, and (3) a Cauchy distribution. The three models can also be viewed as three different t distributions with decreasing degrees of freedom and increasingly heavy tails, as the normal and Cauchy distributions are equivalent to t distributions with df = ∞ and df = 1, respectively. Although the model of Schaub and Royle (2014) allows dispersal kernels on the two axes to have different values of the scale parameter σ (σ1 and σ2 for the x and y axes, respectively), the observed distribution of x and y dispersal distances within the Hooded Warbler study area did not differ significantly for either females (KolmogorovSmirnov D = 0.070, P = 0.63) or males (KolmogorovSmirnov D = 0.065, P = 0.47). I therefore set σ1 = σ2 = σ in all dispersal simulations.
Using the three dispersal kernel models, I then generated simulated xy dispersal data that could be compared to observed xy dispersal data for each sex, searching for the combination of kernel model and σ that would best fit the observed xy dispersal data, as determined by the KolmogorovSmirnov twosample test. Although each simulation was designed to generate about 6,600 total simulated dispersal events, the number of simulated xy dispersal events that occurred within the study area — the ones that could be compared to observed values — varied with both dispersal kernel and σ. The simulations (n = 56) focused on regions of σspace that minimized the KolmogorovSmirnov D statistic, and maximized the corresponding Pvalue, for each of the three tested dispersal kernels.
Estimates of Detection Probability and Annual Survival
I estimated detection probability (p) and apparent annual survival (ɸ) via conventional CJS analysis using R version 4.3.2 (R Core Team 2023) and package marked version 1.28 (Laake et al. 2013). For this analysis, histories of individual females and males were combined into a single dataset and analyzed with a 4parameter model with separate estimates of p and ɸ for each sex. I then used the Bayesian spatial CJS (sCJS) model of Schaub and Royle (2014) to estimate the same parameters correcting for undetected emigration. The sCJS model was run using JAGS version 4.3.2 (Plummer 2003) and the R package r2jags 0.71 (Su and Yajima 2021). For the sCJS analysis I used the dispersal kernel model identified in the previous section and ran the sexes separately. Vague priors (random uniform distribution between 0 and 1) were used for p and ɸ. Because preliminary models encountered convergence issues in estimating the critical dispersal scale parameters σ1 and σ2, particularly in females, more specific priors — random uniform distribution between 0.001 and 2 — were employed for σ1 and σ2 in the final models; these more specific priors are justified, as the simulations described in the previous section indicated that their posterior estimated values would likely fall comfortably within that range. The Markov chain Monte Carlo (MCMC) settings specified three chains, each with 420,000 iterations with the first 20,000 discarded, for a total of 1.2 million saved iterations. No convergence issues were encountered in the final models, with all Rhat values less than 1.02 for females and 1.01 for males.