Data and code from: Addressing widespread detection heterogeneity in avian occupancy modeling using passive acoustic surveys
Data files
Dec 11, 2025 version files 253.58 MB
-
data_field.zip
4.82 KB
-
data_sims_tau_90m_120m.zip
27.50 MB
-
data_sims.zip
10.88 MB
-
README.md
17.88 KB
-
results_all_sim_results_summary.zip
80.91 MB
-
results_field.zip
14.58 KB
-
results_sims_tau_90m_120m.zip
68.96 MB
-
results_sims.zip
65.25 MB
-
scripts.zip
38.61 KB
Abstract
This repository includes 90 simulated avian monitoring datasets, each consisting of data for 160 occupied sites, as well as the results of field PAM surveys for American woodcock (Scolopax minor), Wood Thrush (Hylocichla mustelina), and Eastern Whip-poor-will (Antrostomus vociferus). We used three occupancy modeling approaches (single-season static occupancy models, the Royle-Nichols abundance model, and zero-inflated beta-binomial models) to estimate occupancy in each dataset. We include model fitting results for each monitoring dataset under a variety of surveying intensities (i.e., subset to a varying number of visits to each site) and, for the simulated data, varying underlying occupancy rates (i.e., 20 %, 50 %, or 80 % occupancy for the simulated data). This repository also includes the scripts used to generate simulated data, fit models, and generate the figures for the main manuscript and the supplemental information. These data and scripts can be used to investigate the scenarios under which animal behavior violates the assumptions of occupancy models. It can also be used to understand how assumption violation impacts the accuracy of occupancy estimation under a variety of survey design and modeling choices.
Dataset DOI: 10.5061/dryad.djh9w0wdr
Description of the data and file structure
This repository includes data, results, and code for the manuscript "Addressing widespread detection heterogeneity in avian occupancy modeling using passive acoustic surveys."
In particular, this repository includes simulated data, field data, occupancy model fitting results, and results of tests for detection heterogeneity and goodness-of-fit. It also includes the R scripts needed to reproduce the analysis.
Files and variables
Data files
File: data_sims.zip
Description: Simulated bird detection histories for effective detection radius (tau) = 60 m, as analyzed in the main body of the manuscript.
Files:
sims/scenario_names.RDS(list): scenario names used when generating and analyzing simulations, e.g., density-0.1__movement-0.5.`sims/sims_density-<density>__movement-<movement>.rds: 30 simulated detection histories for effective detection radius = 60 m. Each simulation follows the given wildcard format, corresponding to a combination of bird density and bird movement. The file contains a list with the following data structure:- A list element for each of 100 simulations
- Within each list element, two lists:
detection_histories: a list of detection histories for each of 160 sites. Each detection history contains a table with one row per visit at that site and two columns:num_songs_detected: number of songs detected at the site on this visitnum_individs_detected: number of individuals detected at the site on this visit
parameters_used: a list of parameters used for the simulation, including:num_visits: 20 for all simulationssingle_bird: whether only a single detectable bird was used on the landscapedensity: ifsingle_birdnot used, the density in birds/havocal_rate: sound production rate in songs/minute, identical for all simulationsmove_rate: movement rate in moves/minute, identical for all simulationsmove_sd: movement standard deviationallow_overlap: whether to allow birds' paths to overlap (TRUE)initial_location: whether to set all individuals at the same location, FALSE for all simulations
- Within each list element, two lists:
- A list element for each of 100 simulations
For more information on simulation methodology, see the main manuscript.
File: data_sims_tau_90m_120m.zip
Description: Simulated bird detection histories for effective detection radius (tau) = 90 m or 120 m, as analyzed in the manuscript's supplemental material.
Files:
sims/scenario_names.RDS(list): scenario names used when generating and analyzing simulations, e.g., density-0.1__movement-0.5.`sims/sims_density-<density>__movement-<movement>__tau-<tau>.rds: simulated detection histories for effective detection radius (tau) = 90 m (30 files) and 90 m (30 files). Each simulation follows the given wildcard format, corresponding to a combination of bird density and bird movement. Each file has an identical data structure to the simulation files described above underdata_sims.zip
File: data_field.zip
Description: Detection histories derived from field passive acoustic monitoring data. For more information on data collection procedures, see the main text and supplemental material for each file.
Files:
field/amwo_annots.csv: Passive acoustic monitoring detection history table for American Woodcock (Scolopax minor). Each row corresponds to a single passive acoustic recorder. Columns:card_id(string): Unique identifier for the sampling unit- Date columns (e.g.
20210415,20210416, ...): Sampling dates in YYYYMMDD. Cells contain numeric detections for each date and site (1: detection verified on that date;0: no detection verified on that date)
field/ewpw_annots_dela.csv: Passive acoustic monitoring detection history table for Eastern Whip-poor-will (Antrostomus vociferus). Each row corresponds to a single passive acoustic recorder. Columns:card(string): Unique identifier for the sampling unit.- Date columns (e.g.
2022-06-07,2022-06-08, ...). Cells contain numeric detections for each date and site (1: detection verified on that date;0: no detection verified on that date)
field/woth_annots_ohiopyle_laurelridge.csv: Passive acoustic monitoring detection history table for Wood Thrush (Hylocichla mustelina). Each row corresponds to a single passive acoustic recorder. Columns:site(string): Name of the field site where the point is locatedpoint(string): Unique point identifier within a site (e.g.RK1143)det1,det2, ...,det10(integer 0/1): Visit numbers 1 through 10. Cells contain numeric detections for each detection period and site (1: detection verified in that period;0: no detection verified in that period)
Results files
File: results_all_sim_results_summary.zip
Description: Unzips to all_sims_results_summary.rds, a table containing detection heterogeneity tests, occupancy model fits, and goodness-of-fit tests for simulated data from data_sims.zip and data_sims_tau_90m_120m.zip. Contains one row, with each row representing a combination of:
- tau level for the simulation (0.6, 0.9, and 1.2)
- density and movement scenario (of 30 combinations)for the simulation
- simulation index (of 100)
- occupancy level (0.2, 0.5, or 0.8) the simulation was subset to
- number of visits (2-20) the simulation was subset to
Contains the following variables in tabular format:
ft_pval_heterogeneity: The Fisher's test p-values for detection heterogeneity, including all occupied sites, regardless of whether or not a detection was registered at the siteest_p_detection: The detection probability estimated from a Binomial distribution fit to detection from occupied sites for each of the 100 simulations for each of 19 visit lengths (2-20 visits)psi_bin_ests: Occupancy probabilitypsiestimated by the basic occupancy modelpsi_bin_ests_upper: Upper 95% CI limit forpsi_bin_estspsi_bin_ests_lower: Lower 95% CI limit forpsi_bin_estsp_bin_ests: Detection probabilitypestimated by the basic occupancy modelp_bin_ests_upper: Upper 95% CI limit forp_bin_ests. The remaining variables with suffix_upperfollow this pattern.p_bin_ests_lower: Lower 95% CI limit forp_bin_ests. The remaining variables with suffix_lowerfollow this pattern.psi_rn_ests: Occupancy probabilitypsiestimated by the RN modelpsi_rn_ests_upper:psi_rn_ests_lower: Lower 95% CI limit forpsi_rn_estsr_rn_ests: Individual detection rate parametersrestimated by the RN modelr_rn_ests_upperr_rn_ests_lowerlam_rn_ests: Density parameterslambdaestimated by the RN modellam_rn_ests_upperlam_rn_ests_lowerpsi_zibb_ests: Occupancy probabilitiespsiestimated by the ZIBB modelpsi_zibb_ests_upperpsi_zibb_ests_loweralpha_zibb_ests: Detection history shape parametersalphaestimated by the ZIBB modelalpha_zibb_ests_upperalpha_zibb_ests_lowerbeta_zibb_ests: Detection history shape parametersbetaestimated by the ZIBB modelbeta_zibb_ests_upperbeta_zibb_ests_lowerp_vals_bin: p-value for the Fisher's exact test for goodness-of-fit of the basic model, including only occupied sitesp_vals_rn: p-value for the Fisher's exact test for goodness-of-fit of the Royle-Nichols (RN) model, including only occupied sitesp_vals_zibb: p-value for the Fisher's exact test for goodness-of-fit of the zero-inflated beta-binomial (ZIBB) model, including only occupied sitesp_vals_bin_mb: p-value for the Fisher's exact test for the MacKenzie-Bailey style goodness-of-fit of the basic model, including both occupied and unoccupied sitesp_vals_rn_mb: p-value for the Fisher's exact test for the MacKenzie-Bailey style goodness-of-fit of the Royle-Nichols (RN) model, including both occupied and unoccupied sitesp_vals_zibb_mb: p-value for the Fisher's exact test for the MacKenzie-Bailey style goodness-of-fit of the zero-inflated beta-binomial (ZIBB) model, including both occupied and unoccupied sitesscenario_name: name for this scenario, in formatdensity-<density>__movement-<movement>__tau-<tau>sim_num: simulation indexocc_level: occupancy level (0.2, 0.5, or 0.8)num_visits: Number of visits subset to (2-20)density: birds/ha density used in the simulationmovement: movement standard deviation used in the simulationtau: effective detection radius used in the simulationscenario_sim_name: combination ofscenario_nameandsim_numin format:density-<density>__movement-<movement>__tau-<tau>__sim-<sim_num>
File: results_sims.zip
Description: Unzips to a folder, sims, containing simulation analysis results for the tau=60 m simulations and an additional RDS file.
Files:
-
sims/results_summary.RDS: identical to theRDSfile described inresults_all_sim_results_summary.zipexcept it contains only the tau=60 m subset of the simulations. -
sims/fits__<occupancy>-occupancy__density-<density>__movement-<movement>.rds: 90 simulated detection histories, one for each of three occupancy levels applied to each of the 30 simulated detection histories. Contains all data present in the summary file described above, with the following data structure:Vectors containing values for each of the 100 simulations for each of 19 visit lengths (2-20 visits) (shape: 100x19):
ft_pval_heterogeneityest_p_detectionp_vals_binp_vals_rnp_vals_zibbp_vals_bin_mbp_vals_rn_mbp_vals_zibb_mb
Vectors containing the Predicted, lower 95% CI boundary, and upper 95% CI boundary for each of the 100 simulations for each of 19 visit lengths (2-20 visits) (shape: 100x19x3):
psi_bin_estsp_bin_estspsi_rn_estspsi_rn_estslam_rn_estspsi_zibb_estsalpha_zibb_estsbeta_zibb_ests
Also contains additional data not included in the summary file described above:
max_num_visits(integer): The maximum number of visits any site could have (20 for all)realized_occ(double): The true proportion of simulated occupied sitesd_i(character): The number of sites withidetections across the full-detection history, for i=0 (no detections) to 20 (detections for all periods)title: The scenario name
File: results_sims_tau_90m_120m.zip
Description: Unzips to a folder, sims_tau_90m_120m, containing simulation analysis results for the tau=60 m and 120 m simulations and an additional RDS file.
Files:
sims/hightau_results_summary.RDS: identical to theRDSfile described inresults_all_sim_results_summary.zipexcept it contains only the tau=90 m and 120 m subsets of the simulations.sims_tau_90m_120m/fits__<occupancy>-occupancy__density-<density>__movement-<movement>__tau-<tau>.rds: Data structure in these 180 files is identical to that of the 90sims/fits_files described above underresults_sims.zip.
File: results_field.zip
Description: Unzips to a folder, field, containing three files, each with a data structure similar to the sims/fits_ files described above under results_sims.zip.
Files:
-
field/amwo_results.rds: Analysis results for theamwo_annots.csvdata fromdata_field.zip -
field/ewpw_results.rds: Analysis results for theewpw_annots_dela.csvdata fromdata_field.zip -
field/woth_results.rds: Analysis results for thewoth_annots_ohiopyle_laurelridge.csvdata fromdata_field.zipInterpretations of the data names in these files are identical to those described under the
results_all_sim_results_summary.zipheader, except where denoted. Each file contains the following:Vectors containing values for each subset of visits (length:
max_num_visits-1).ft_pval_heterogeneity: The Fisher's test p-values for detection heterogeneity, including only sites that had at least one detectionest_p_detection: The detection probability estimated from a Binomial distribution fit to detection from sites with at least one detection for each of the visit lengthsp_vals_bin: p-value for the Fisher's exact test for goodness-of-fit of the basic model, including only sites that had at least one detectionp_vals_rn: p-value for the Fisher's exact test for goodness-of-fit of the RN model, including only sites that had at least one detectionp_vals_zibb: p-value for the Fisher's exact test for goodness-of-fit of the ZIBB model, including only sites that had at least one detectionp_vals_bin_mbp_vals_rn_mbp_vals_zibb_mb
Vectors containing the Predicted, lower 95% CI boundary, and upper 95% CI boundary for each subset of visits (from two visits to
max_num_visits-1visits) (shape: 3xmax_num_visits-1):psi_bin_estsp_bin_estspsi_rn_estspsi_rn_estslam_rn_estspsi_zibb_estsalpha_zibb_estsbeta_zibb_ests
Also contains additional data not included in the summary file described above:
max_num_visits(integer): The maximum number of visits any site could have (max_num_visits)naive_occs(double, length:max_num_visits): The proportion of sites with at least one detection for each subset of visit lengthsd_i(character): The number of sites withidetections across the full-detection history, for i=0 (no detections) tomax_num_visits - 1(detections for all periods)title: The scenario name
Code/software
File: scripts.zip
Description: Expands into a folder, scripts, which includes R scripts required to simulate data, fit models to data, analyze the results, and create the figures for the main manuscript and supplemental information. Scripts can be run using R 4.3.0.
Files:
scripts/1_bsims_simulate.R- script to generate simulated data- Needed packages: bSims, future.apply, plotrix
scripts/2_occupancy_estimation.R- script to estimate occupancy from simulated and field data- Needed packages: unmarked, VGAM, future.apply, fitdistrplus
scripts/3_make_figures.R- script to create figures from results- Needed packages: cowplot, ggpubr, grid, gridExtra, scales, ggplot2, stringr, dplyr, tidyr, VGAM
scripts/4_conceptual_figure.R- script to create the conceptual figure, Figure 1- Needed packages: ggplot2, gridExtra, VGAM, unmarked
Usage: These scripts can be used to either reproduce the analysis from scratch or pick up the analysis from any step based on the rest of the data included in this Dryad repository. The folder structure of the files, once the analysis is fully reproduced, should be:
Step 1: Set up data directory - Original script: 1_bsims_simulate.R
This step is required if you wish to reproduce data simulation or fit occupancy models directly to the data. If you are only interested in exploring the results, proceed to Step 2.
The field data must be downloaded from the repository, but the simulated data can be generated using 1_bsims_simulate.R. However, the simulation process is very time-consuming and was originally parallelized on a high-performance computing cluster. Thus, if you do not wish to change the simulation parameters, we recommend downloading the original simulated data and organizing the data/ directory using the following process.
- Create a folder called
datain the same directory as the scripts - Download each of the
.zipfiles in this repository starting with the worddataand place them in thedatafolder - Unzip each file so that the corresponding subfolders are created:
data/sims/- folder containing simulated data with tau = 60 m.data/sims_tau_90m_120m/- folder containing simulated data with tau = 90 m and 120 m.data/field/- folder containing field data.
Step 2: Set up results directory - Original script: 2_occupancy_estimation.R
The results can be generated by setting up the data/ directory as described above and running 2_occupancy_estimation.R. However, the model fitting process is very time-consuming. Thus, if you only wish to access the original results of the paper, without reproducing the model-fitting process, we recommend setting up the results directory using files from Dryad as follows:
- Create a folder called
resultsin the same directory as the scripts - Download the field results: download
results_field.zip, place it in theresultsfolder, then unzip the file to create thefieldsubfolder - Download the simulated results using one of the following sets of instructions:
- Recreating summary file (recommended for speed):
- Download the
results_all_sim_results_summary.zipfrom Dryad. - Unzip this file to create
all_sim_results_summary.RDSwithin the top-level results folder.
- Download the
- Downloading full results files:
- Download the other two
.zipfiles starting withresults_sim - Place the downloads in the
resultsfolder - Unzip each file so that the corresponding subfolders are created
- Change the
RELOADvariable in3_make_figures.Rto beRELOAD <- TRUE- to recreate the summary file for the simulations
- Download the other two
- Recreating summary file (recommended for speed):
Step 3: Create figures - Original scripts: 3_make_figures.R and 4_conceptual_figure.R
Figure 1 can be generated using 4_conceptual_figure.R. Figures 2-7 and all supplemental figures can be generated by 3_make_figures.R, using data in the results directory.
