Data for: Considerations for fitting occupancy models to data from eBird and similar volunteer-collected data
Data files
Jul 19, 2023 version files 23.44 MB
-
README.md
-
SimResults_suppl.csv
-
SimResults.csv
Abstract
An occupancy model makes use of data that are structured as sets of repeated visits to each of many sites, in order estimate the actual probability of occupancy (i.e., proportion of occupied sites) after correcting for imperfect detection using the information contained in the sets of repeated observations. We explore the conditions under which preexisting, volunteer-collected data from the citizen science project eBird can be used for fitting occupancy models. The data archived here are used to explore two ways in which the single-visit records could be used in occupancy models. First, we use empirical data contained within this archive to assess the potential for space-for-time substitution: aggregating single-visit records from different locations within a region into pseudo-repeat visits. The archived data are used to illustrate that the locations chosen for data collection by observers were not always representative of the habitat in the surrounding area, which would lead to biased estimates of occupancy probabilities when using space-for-time substitution. Second, create a large set of simulated data (output from the simulations contained in this archive) that we used to explore the utility of including data from single-visit records to supplement sets of repeated-visit data.
Methods
The seven data files archived are of three different types:
Raw data: records of bird watching events in which observations of wild bird were made by volunteer participants in eBird and entered into eBird's database. These data have not undergone subsequent processes, aside from being associated with identifier variables that can be used to connect records of observations made at the same locations or by the same observers.
Combination and derivation from primary data: information from eBird observation events (location, date, time, observer effort) are combined with summaries of landcover in the immediate area surrounding each location of an eBird observation. The landcover information is derived from the MODIS MCD12Q1v006 data product, with derived estimates of landcover percentages calculated by summarizing information from multiple pixels within the immediate areas around all locations of eBird observations.
Simulation output: records of parameter estimates and their precision from occupancy models fit to large numbers of sets of simulated data, with the goal of understanding how variation in conditions (the conditions examined in the paper with which the data are associated) affected the accuracy and precision with which detection and occupancy probabilities were estimated. The simulated study was needed because true detection probability and true occupancy probability are never known for empirical data.
Usage notes
All data files can be viewed as plain text within a text editor, although the archived tables are better viewed within computer programs designed to display and manipulate tabular data. The archived data are almost all comma-delimited value (.csv) tables, with one tab-delimited value (.txt) table. In our use of these data we have read them into the statistical software R, and at least one aspect of the format of some of the data tables is the use of the letters "NA" to denote missing values as is the convention in R.