Data from: Does a decision support tool designed to depict West Nile Virus risk explain variation in ruffed grouse (Bonasa umbellus) use of managed forests?
Data files
Jul 07, 2025 version files 47.41 KB
-
Goldman-et-al-25-data.csv
40.19 KB
-
README.md
7.21 KB
Abstract
Infectious diseases are commonly cited as significant contributors to wildlife population declines. It is, therefore, important to investigate the extent to which tools designed to mitigate the effects of infectious diseases explain wildlife responses to habitat management. Ruffed grouse (Bonasa umbellus) populations have experienced long-term declines throughout their eastern range. These declines are strongly correlated with the reduced availability of early successional forests and, in more recent decades, the mosquito-borne pathogen, West Nile Virus (Flaviviridae, Flavivirus; WNV). Efforts to increase the amount of early successional forests have intensified in Pennsylvania over the past twenty years, especially across northern hardwood and mixed oak timber stands. Additionally, a decision support tool for predicting WNV risk (the Grouse Priority Area Siting Tool, G-PAST) was developed to help inform where ruffed grouse habitat creation would be most effective by minimizing contact between grouse and WNV-carrying mosquitoes. Forest type is also known to influence ruffed grouse space use and demography. Thus, monitoring ruffed grouse response to habitat creation through the lens of predicted WNV risk and forest community type (northern hardwood vs. mixed oak) may provide managers with further insight regarding strategies for ruffed grouse population recovery. From 2021-23, we deployed autonomous recording units (paired with an autonomous classifier) in 305 regenerating timber harvests (7–16-year-old) across Pennsylvania. Survey locations were stratified by WNV risk level (low vs. high) and forest type. Overall, ruffed grouse occupancy ( Ψ = 0.75) in Pennsylvania was most influenced by landscape connectivity (+), % mixed oak (-), and short woody stem density (-) but not G-PAST predicted WNV risk. Thus, land managers aiming to conserve Pennsylvania’s ruffed grouse should focus their attention on aspects of landscape connectivity and forest type when implementing grouse habitat management. Managers in Pennsylvania can expect, on average, high ruffed grouse occupancy (~75% of sites) in 7–16-year-old northern hardwood or mixed oak stands, regardless of WNV risk predicted by G-PAST. Our results demonstrate that successful outcomes for forest management that target ruffed grouse will be driven by landscape characteristics, forest type, and within-stand vegetation. Future work correlating longer-term patterns (e.g., dynamic occupancy) or demographic rates with G-PAST predictions may provide additional insight to help further guide grouse conservation efforts in Pennsylvania.
[https://doi.org/10.1002/wlb3.01389]
Description of the data and file structure
This "Goldman_etal_2025_grouse_readme.txt" file was generated on 10 June 2025 by Jacob Goldman
GENERAL INFORMATION
1. Title of Dataset: Does a decision support tool designed to depict West Nile Virus risk explain variation in ruffed grouse (Bonasa umbellus) use of managed forests?
2. Author Information
A. Corresponding Author Contact Information
Name: Jacob Goldman
Institution: Indiana University of Pennsylvania
Address: 570 S. 11th St., Indiana, PA 15705
Email: jake.goldman.45@gmail.com
3. Date of data collection (single date, range, approximate date):
April 2021 through August 2023
4. Geographic location of data collection:
Commonwealth of Pennsylvania, United States
5. Information about funding sources that supported the collection of the data:
This research was funded by grants awarded to J.L. Larkin from the United States Department of Agriculture-Natural Resource Conservation Service Conservation Effects Assessment Project, The Richard King Mellon Foundation, Pennsylvania Game Commission, and The Ruffed Grouse Society. Additional support was provided by funding from the Gordon and Betty Moore Foundation awarded to J. Kitzes and a McIntire-Stennis Capacity Grant (#KY009043) awarded to D.J. McNeil.
SHARING/ACCESS INFORMATION
1. Licenses/restrictions placed on the data:
None to report
2. Links to publications that cite or use the data:
None to report
3. Links to other publicly accessible locations of the data:
None to report
4. Links/relationships to ancillary data sets:
None to report
5. Was data derived from another source? yes/no
A. If yes, list source(s):
No
6. Recommended citation for this dataset:
Goldman et al. (2025), Does a decision support tool designed to depict West Nile Virus risk explain variation in ruffed grouse (Bonasa umbellus) use of managed forests?, Dryad, Dataset, 10.1002/wlb3.01389
DATA & FILE OVERVIEW
1. File List:
File: Goldman-et-al-25-data.csv
Description: This file contains all data needed to replicate analyses presented in Goldman et al. 2025
2. Relationship between files, if important:
Not applicable
3. Additional related data collected that was not included in the current data package:
None
4. Are there multiple versions of the dataset? No
A. If yes, name of file(s) that was updated:
i. Why was the file updated? Not applicable
ii. When was the file updated? Not applicable
METHODOLOGICAL INFORMATION
1. Description of methods used for collection/generation of data:
See the following sections from Goldman et al. (2025), Methods:
> Study area and sampling locations
> Autonomous Recording Units
> Vegetation sampling
> Remotely sensed data
> Recording processing and verification
2. Methods for processing the data:
See the following sections from Goldman et al. (2025), Methods:
> Statistical analysis
3. Instrument- or software-specific information needed to interpret the data:
See the following sections from Goldman et al. (2025), Methods:
> Study area and sampling locations
> Autonomous Recording Units
> Vegetation sampling
> Remotely sensed data
> Recording processing and verification
> Statistical analysis
4. Standards and calibration information, if appropriate:
None to report
5. Environmental/experimental conditions:
See Goldman et al. (2025), Methods.
6. Describe any quality-assurance procedures performed on the data:
See Goldman et al. (2025), Methods.
7. People involved with sample collection, processing, analysis and/or submission:
JG, JLL, and DJM conceived the ideas and designed the study; JG collected
the data and analyzed the data; JLL and DJM secured funding for the study;
All authors wrote and edited the paper and gave final approval for publication.
DATA-SPECIFIC INFORMATION FOR: Goldman-et-al-25-data.csv
Files and variables
File: Goldman-et-al-25-data.csv
Description:
Variables
- name: point_id; description: category; location identity (i.e., site)
- name: year; description: year during which data was collected
- name: juliandate_b1; description: ordinal date of the first day of the first ruffed grouse survey week
- name: juliandate_b2; description: ordinal date of the first day of the second ruffed grouse survey week
- name: juliandate_b3; description: ordinal date of the first day of the third ruffed grouse survey week
- name: juliandate_b4; description: ordinal date of the first day of the fourth ruffed grouse survey week
- name: juliandate_b5; description: ordinal date of the first day of the fifth ruffed grouse survey week
- name: occu_wk1; description: binary; detection of ruffed grouse during survey week 1 (1 = detected, 0 = not detected)
- name: occu_wk2; description: binary; detection of ruffed grouse during survey week 2 (1 = detected, 0 = not detected)
- name: occu_wk3; description: binary; detection of ruffed grouse during survey week 3 (1 = detected, 0 = not detected)
- name: occu_wk4; description: binary; detection of ruffed grouse during survey week 4 (1 = detected, 0 = not detected)
- name: occu_wk5; description: binary; detection of ruffed grouse during survey week 5 (1 = detected, 0 = not detected)
- name: pres_abs; description: binary; detection of ruffed grouse at any point during weeks 1-5 (1 = detected, 0 = not detected)
- name: basal_m2_ha; description: continuous; basal area (m²/hectare)
- name: avg_stem_dens; description: total stem density (# stems/m²)
- name: stems_less_1.5m; description: stem density for stems <1.5m tall (# stems/m²)
- name: stems_greater_1.5m; description: stem density for stems >1.5m tall (# stems/m²)
- name: stand_age; description: age of managed forest stand in years
- name: mins_b1; description: minutes of recording time during survey week 1
- name: mins_b2; description: minutes of recording time during survey week 2
- name: mins_b3; description: minutes of recording time during survey week 3
- name: mins_b4; description: minutes of recording time during survey week 4
- name: mins_b5; description: minutes of recording time during survey week 5
- name: rec_duration; description: number of hours that the ARU was programmed to record
- name: connx5km; description: continuous; average landscape connectivity value for a 5km buffer surrounding each survey location
- name: wnv_high_500m; description: continuous; percentage of pixels belonging to the WNV high-risk category within a 500m buffer
- name: p_mo; description: continuous; percentage of pixels classified as mixed oak forest within a 500m buffer
- name: p_nh; description: continuous; percentage of pixels classified as northern hardwood forest within a 500m buffer
- NOTE: cells with "N/A" indicates data was Not Available
Code/software
No code available
Access information
Other publicly accessible locations of the data:
- None
Data was derived from the following sources:
- See Methods
Study Area and Site Selection
During 2021–2023, we deployed ARUs to survey for the presence of ruffed grouse in regenerating timber harvests across seven Pennsylvania State Forest districts and 23 State Game Lands (Figure 1). These sites spanned 29 counties within the Appalachian Plateaus and Ridge-and-Valley physiographic regions and occurred in forests characterized into two general classes based on species composition: regenerating mixed oak forests and regenerating northern hardwood forests. Regenerating mixed oak forests in Pennsylvania were comprised of species such as northern red oak (Quercus rubra), chestnut oak (Q. montanus), black oak (Q. velutina), white oak (Q. alba), hickories (Carya spp.), sassafras (Sassafras albidum), black gum (Nyssa sylvatica), mountain laurel (Kalmia latifolia), blueberry (Vaccinium spp.), and black huckleberry (Gaylussacia baccata.; Fike 1999). Regenerating northern hardwood forests in Pennsylvania were dominated by species such as aspen (Populus spp.), black birch (Betula lenta), yellow birch (B. allegheniensis), American beech (Fagus grandifolia), maples (Acer spp.), witch-hazel (Hamamelis virginiana) and serviceberry (Amelanchier spp.; Fike 1999). All timber harvests selected for survey were treated with overstory removal and had regenerated for 7–16 years prior to the survey. Stands averaged 41.6 ha (SD: 46.8, range: 2.4–307 ha, mixed oak average: 26.2 ha, northern hardwood average: 24.8 ha) and were at elevations between 189 to 879 m above sea level. We used the forest grouping layer from the US Forest Service’s Forest Inventory and Analysis dataset in ArcGIS Pro to identify the percentage of the area surrounding each survey location that was either “oak-hickory” (hereafter, mixed oak) or “maple-beech-birch” (hereafter, northern hardwood; USDA Forest Service 2021). First, we created a 500-m buffer around each survey location and used the Summarize Categorical Raster tool in ArcGIS Pro to calculate the number of pixels within the buffer for both mixed oak and northern hardwood forest types. A 500-m (79 ha) extent was chosen based upon published estimates of male grouse home ranges in deciduous forests which ranged from 11.3– 84 ha (Fearer and Stauffer 2003, Whitaker et al. 2007, Thompson and Fritzell 1989). We elected to use a buffer size of 500 m to best ensure we considered the largest area potentially used by grouse throughout the previous year. In preliminary analyses, we also experimented with summarizing forest types at smaller spatial scales (e.g., 200 m) but the classification was not different in >95% of cases. Ultimately, we used a 500-m buffer and calculated the percentage of pixels within the buffer that were mixed oak and northern hardwood forest type, which was then included in our statistical analyses as continuous variables. We deployed ARUs at semi-random locations within each study stand (hereafter “points”). We generated these random points using the “Create Random Points” tool in ArcGIS Pro (ESRI 2023. ArcGIS Pro Release 3.1.2. Redlands, CA: Environmental Systems Research Institute). All points were at least 500 m apart (range: 500 m – 19 km) to ensure points were spatially independent (Ralph et al. 1995) and minimize chances of observing single individuals at multiple locations (average ruffed grouse movement during the breeding season is ~270 m; (Thompson and Fritzell 1989). Recent work by Lapp et al. (2023) demonstrated that the ARU model we used did not detect grouse beyond 200 m. To limit sampling of surrounding unharvested forest, we generated random points at least 50 m from a mature forest/harvest edge.
Autonomous Recording Units
During all three years of field sampling, we surveyed points using “AudioMoth” ARUs (Open Acoustic Device’s Models 1.0.0, 1.1.0, 1.2.0; Hill et al. 2019). We programmed ARUs to record at a rate of 32 kHz and medium gain for 2 hours (0630-0830; 2021) or 1.5 hours (0630–0800; 2022–2023) each morning. Each survey location was sampled for a single season. All ARUs were deployed before 15 April and recovered after 19 May, which covered peak ruffed grouse drumming activity in our study region (Rusch et al. 2020). Each ARU was housed in a resealable plastic bag along with two 1g desiccant packs to prevent moisture damage. We used a zip-tie to attach each ARU to a sapling at a height of approximately 1.5 m, oriented with the microphone facing a random direction.
Vegetation Sampling
We conducted vegetation sampling at each ruffed grouse survey point at the time of ARU recovery. Vegetation sampling was intended to quantify two habitat components: 1) tree basal area (m2/ha) and 2) woody regeneration (# stems/m2). Using the ARU location as plot center, we randomly selected with replacement the direction (0°, 120°, or 240°) of two 35 m transects. At plot center and at the end of each of the two transects, we used a 10-factor wedge prism (Burkhardt 2019) to estimate tree basal area. The three basal area values were averaged to obtain a site level estimate. We quantified the density of woody regeneration (stems <10 cm diameter breast height; DBH and > 0.5 m tall) within two 1x10 m plots that were randomly placed along the two 35 m transects. We counted all woody stems within each plot and recorded species identity and height category (short = 0.5–1.5 m in height; and tall ≥ 1.5 m in height) for each stem. We averaged the woody stem counts from the two plots to generate site level estimates for short and tall woody stem density.
Remotely Sensed Data
We used the G-PAST spatial layer in ArcGIS Pro to assign each survey point a predicted WNV risk level. Via the G-PAST, areas of “high WNV risk” occur at low elevations (<488 m) and/or have wet soil types (i.e., poorly drained, somewhat poorly drained, or very poorly drained soils) while areas of low WNV risk occur at high elevations (≥488 m) and have dry soil types (i.e., excessively drained, somewhat excessively drained, well drained, or moderately drained). To generate a % high WNV risk value for each survey location, we used the buffer tool in ArcGIS Pro to generate a 500 m buffer around each survey point and then used the summarize categorical raster tool to calculate the number of pixels belonging to the high-risk category inside each 500 m buffer. The number of pixels belonging to the high WNV risk category was then divided by the total number of pixels within the 500 m buffer to generate a value for % WNV high risk for each point. The G-PAST, developed by the Pennsylvania Game Commission (PGC), divides the PA landscape into 4 categories – priority 1, priority 2, priority 3, and non-priority. Consultation with PGC biologists led to our interpretation of the tool – that areas located within any of the priority levels (1-3) have lower incidence of WNV-carrying mosquitoes than non-priority areas (which are non-priority for habitat management due to the predicted WNV risk). Because ruffed grouse occurrence patterns have been linked to landscape scale forest connectivity (Porter and Jarzyna 2013), we assigned each survey location a landscape connectivity value, extracted from The Nature Conservancy’s Local Connectedness data layer (Anderson et al. 2023) in ArcGIS Pro. This layer describes the amount and arrangement of connected natural forests and shrublands and excludes anthropogenic land covers like human development, infrastructure, and agriculture (Anderson et al. 2023). We averaged local connectedness values within a 5 km buffer around each survey location based on existing understanding of ruffed grouse natal dispersal distances (Rusch et al. 2020), as well as the understanding that while connectedness at a given survey location may be high, areas bordering that location locally and on the landscape scale may be different.
Recording Processing and Verification
We used the automated detection method developed by Lapp et al. (2023) to locate ruffed grouse drumming events in ARU recordings. The detector analyzes audio files in 60-second non-overlapping segments, searching for the accelerating sequence of low-frequency pulses that characterizes ruffed grouse drumming (Lapp et al. 2023). We provide a sample analysis script in an open-source GitHub repository (https://github.com/kitzeslab/ruffed_grouse_occupancy_2024). We sorted the detections by sequence length (number of pulses) from highest to lowest, then extracted 10 seconds of audio centered on each detected sequence for the five longest detected sequences from each day, for each point. We reviewed these detections by listening to the audio and visually inspecting a spectrogram to determine the presence of ruffed grouse drumming. If we confirmed the presence of ruffed grouse drumming for a given day and point, we assigned a detection score of “1” for that point-day and did not review further detections from that point-day. We assigned a detection score of “0” for all point x days if our review process did not produce verified detections of ruffed grouse drumming, or if there were no detections from the automated detector. The final output of this automated classifier and human validation process was a binary detection matrix for each point and each day from 15 April to 19 May. We compressed daily detection histories into 7-day (1 week) blocks, yielding five week-long secondary sampling occasions for occupancy modeling.
Statistical Analysis
To estimate occupancy probability for ruffed grouse in early successional forests across Pennsylvania, we used single-season occupancy models (MacKenzie et al. 2018) fit using the R package unmarked (Fiske and Chandler 2011; R Core Team 2022). There are four assumptions of a single-season occupancy model (MacKenzie et al. 2006). These assumptions are 1) site closure, 2) homogenous occupancy among sites unless otherwise modeled, 3) homogenous detection among secondary sampling occasions unless otherwise modeled, and 4) detection histories are independent among sites (MacKenzie et al. 2006). As with many point count studies, there exists potential for violation of closure, but the rate of this violation is often difficult to estimate (Rota et al. 2009). We attempted to minimize the risk of assumption violation by sampling over a brief period (i.e. 5 weeks; MacKenzie et al. 2002) during the period when survival is highest (Mangelinckx et al. 2018) and males are largely sedentary (Thompson and Fritzell 1989). We ensure that assumptions 2-3 were met by modeling detection and occupancy using covariates (detailed below). Finally, we ensured sites were independent (assumption 4) by separating each by at least 500 m (detailed above). We tested for correlations among all covariates, calculating pairwise Pearson’s Correlation Coefficients with the cor() function in R. Variables that had correlation coefficients >±0.6 were considered correlated (Sokal and Rohlf 1969). If two variables were correlated, we excluded the one expected to have the least potential influence on ruffed grouse ecology from further analyses (Table 1). We included the following covariates to estimate ruffed grouse detection probability: (1) average ordinal date of the survey week, (2) daily recording duration, and 3) woody stem density. We included the following covariates to estimate ruffed grouse occupancy: (1) basal area, (2) short woody stem density, (3) tall woody stem density, (4) predicted WNV Risk (% of “high” WNV risk pixels within 500 m), (5) percent mixed oak forest (% of mixed oak forest type pixels within 500 m), (6) landscape connectivity within 5 km radius, (7) interaction between % mixed oak and predicted WNV risk, and (8) stand age. Using these covariates, we constructed a single “full” global model and examined parameter estimates to ensure convergence. We began with a global model to ensure the convergence of the most complex model in our model set because beginning our sub-setting procedure with a model that did not converge would be inappropriate. Using the dredge function in R (package MuMIn; Bartoń 2023), we created all possible subsets of variables present in the full global model (n = 160 models). We ranked all models using Akaike’s Information Criterion adjusted for small sample size (AICc; Burnham and Anderson 2002). Models with a ΔAICc <2.0 compared to our top model were considered competing. For top models, we examined β coefficient 95% confidence intervals and interpreted those overlapping 0 to have weak effects (though 85% confidence intervals may be considered sufficient; Arnold 2010). For the top model we calculated Brier’s score and Area Under the Curve (AUC) to assess goodness of fit. Brier scores vary from 0 to 1 and values closest to 0 indicate predictions that are consistent with realized outcomes. Conversely, AUC values, which also vary from 0 to 1, are “best” closer to 1, whereby greater values indicate better model performance. We did this using 10-fold cross validation using a 75%/25% training/testing split for each of the 10 replicates (McNeil et al. 2023). All continuous variables were scaled to have a mean of 0 and a standard deviation of 1.0, using the scale() function in Program R. We also present per-variable sums of model weights calculated using the sw() function in the MuMin package.