Integrating presence-only and detection/non-detection data to estimate distributions and expected abundance of difficult-to-monitor species on a landscape-scale
Cite this dataset
Twining, Joshua et al. (2024). Integrating presence-only and detection/non-detection data to estimate distributions and expected abundance of difficult-to-monitor species on a landscape-scale [Dataset]. Dryad. https://doi.org/10.5061/dryad.ghx3ffbwf
Abstract
Estimating species distribution and abundance is foundational to effective management and conservation. Using an integrated species distribution model that combines presence-only data from various sources with detection/non-detection data from structured surveys, we estimated the distribution and expected abundance of difficult-to-monitor mammals of management concern across New York State, namely, coyotes, bobcats, and black bears. Three distinct landscape-scale camera trap surveys provided detection/non-detection data over nine years between 2013-2021, and we augmented those data with incidental records of our focal species from public repositories. We used an inhomogeneous Poisson point process to construct an integrated model that fit both data types simultaneously. We demonstrate a simple application of spatial point density of all species records in the accessed public databases to inform the thinning process to account for unknown spatial sampling in the presence-only data, often referred to as the “magic covariate”. Using this approach, we examine habitat associations and provide spatially explicit estimates of expected abundance across the entirety of New York state for all three focal species.
As expected, coyotes were the most widely distributed and abundant species, with a strong positive association with agricultural land uses. Bobcats exhibited low expected abundance throughout the state and showed positive associations with deciduous forest and forest edge, and a negative association with road density. Finally, we observed considerable spatial variation in abundance of black bears with expected numbers increasing in association with various forest cover and composition covariates and decreasing with crop cover. We present insights into habitat associations and provide management implications for each of the species of interest.
Our integrated modelling method allows for managers to use citizen sightings combined with detection/non-detection surveys to estimate robust indices of abundance for both high- and low-density, and wide-spread versus patchily distributed species. Through comparison with previous studies, we highlight how broad-scale programs, such as the statewide efforts to estimate species distributions undertaken here, can benefit substantively from integrated models that leverage additional data (here, incidental records) from a larger region of space, and thus capture more landscape heterogeneity than is plausible within formalized surveys.
README: Integrating presence-only and detection/non-detection data to estimate distributions and expected abundance of difficult-to-monitor species on a landscape-scale.
https://doi.org/10.5061/dryad.ghx3ffbwf
Summary
These are the data, MCMC samplers, and processing and run scripts for an inhomogeneous Poisson point process model to integrate detection/non-detection data and presence-only data to estimate the expected abundance of species. This model enables the user to integrate these two datatypes by assuming they share the same underlying data-generating process (an inhomogeneous Poisson point process). These models are adapted from Koshkina et al. (2017) [Methods in Ecology and Evolution] but diverge from the original specification to accommodate t primary sampling periods (survey years). This model fits temporally varying mean-centered random year effects of all three parameters in the model (detection probability, p, mean expected abundance, λ, and thinning, b).
The working directory
Below you will find descriptions of each folder in this repository and files contained within them.
The data folder (./data)
This folder has seven files. NAs in all datasets below refer to missing data (NA - not available).
1. allNY_10kmgrid_alllandscapecovs_final.csv
This file contains all of the summarized spatial covariate data used in the analysis. Each row is a different 10km 2 pixel in New York State, each column is a covariate, and each cell is a value.
The covariates used in the analysis.
Covariate | Description | Source |
---|---|---|
plot_id | Site identifier | User generated (sequence 1-14008) |
x | The x coordinate of the site in UTM NAD83 | Universal Transverse Mercator |
y | The y coordinate of the site in UTM NAD83 | Universal Transverse Mercator |
Deciduous | Proportion of a 10 km2 grid cell made up of deciduous forest | NLCD, 2019 (https://www.mrlc.gov/data/nlcd-2019-land-cover-conus) |
Coniferous | Proportion of a 10 km2 grid cell made up of coniferous forest | NLCD, 2019 (https://www.mrlc.gov/data/nlcd-2019-land-cover-conus) |
Mixed | Proportion of a 10 km2 grid cell made up of mixed forest | NLCD, 2019 (https://www.mrlc.gov/data/nlcd-2019-land-cover-conus) |
Pasture | Proportion of a 10 km2 grid cell made up of pasture | NLCD, 2019 (https://www.mrlc.gov/data/nlcd-2019-land-cover-conus) |
Cultivated.Crops | Proportion of a 10 km2 grid cell made up of cultivated crops | NLCD, 2019 (https://www.mrlc.gov/data/nlcd-2019-land-cover-conus) |
road_density | Mean number of km of road per km 2 in each grid cell | calculated from primary and secondary roads raster provided by the NYSDEC |
elevation | Mean elevation (m) of the 10 km2 grid cell | calculated from Digital Elevation Models of New York State provided by the NYSDEC |
forest_edge | Edge density of combined class of all forest | NLCD, 2019 (https://www.mrlc.gov/data/nlcd-2019-land-cover-conus) |
PO_join_count_truncated | density of presence-only points (all mammals from online repositories from 2013-2021) | calculated from all online repositories, hosted on github (/data/Allcompiled_POdata_NYMS_noduplicates) |
2. allNY_2016-2018_bearPAdata_pixelID_10kmgrid_final.csv
This file contains all the detection/non-detection data for black bears collected during summer surveys from 2016-2018 in the southern tier of New York State. Each row is a site, each column is an occasion, each cell is a detection/non-detection record.
The columns in this csv are:
Column | Description |
---|---|
FID_allNY_10km | The 10km 2 grid cell ID the site was in |
Station | The site ID |
o1 | Detection/non-detection data (0/1) for first occasion |
o2 | Detection/non-detection data (0/1) for second occasion |
o3 | Detection/non-detection data (0/1) for third occasion |
o4 | Detection/non-detection data (0/1) for fourth occasion |
o5 | Detection/non-detection data (0/1) for fifth occasion |
year | The year the sampling was conduced in |
x | The x coordinate in UTM NAD83 |
y | The y coordinate in UTM NAD83 |
3. allNY_2016_2018_bearPOrecord_pixelID_10kmgrid_noweeklydups_final.csv
This file contains all the presence-only data used in this analysis for black bears (maximum 1 per 10 km2 pixel per week) between the years 2016-2018 both from public online repositories and iSeeMammals, a citizen science bear monitoring project run by Cornell University.
The columns in this csv are:
Column name | Description |
---|---|
FID_allNY_10km | The 10km 2 grid cell ID the site was in |
latitude | the latitude of the detection (WGS 84) |
longitude | the longitude of the detection (WGS 84) |
Month | The month of the year of the detection |
Day | The day of the year of the detection |
Year | The year the sampling was conducted in |
Date | The date of the detection in MM/DD/YYYY format |
Time | The time of the detection in HH:MM:SS format |
4. NY_blackbears_ordinaldates.csv
This file contains the sampling dates (in ordinal format) for each sampling occasion from the summer surveys from 2016-2018. Each row is a site, each column is an occasion.
The columns in this csv are:
Column name | Description |
---|---|
Station | Site identifier |
year | The year of sampling |
utm_x | The x coordinate of the site in UTM NAD 83 |
utm_y | The y coordinate of the site in UTM NAD 83 |
date1 | The ordinal date for the first occasion |
date2 | The ordinal date for the second occasion |
date3 | The ordinal date for the third occasion |
date4 | The ordinal date for the fourth occasion |
date5 | The ordinal date for the fifth occasion |
5. allNY_2013-2021_coyotePArecords_pixelID_10kmgrid.csv
This file contains all the detection/non-detection data for coyotes collected during winter surveys from 2013-2021 in the southern tier of New York State. Each row is a site, each column is an occasion, each cell is a detection/non-detection record.
The columns in this .csv are:
Column | Description |
---|---|
FID_allNY_10km | The 10km 2 grid cell ID the site was in |
Station | The site ID |
o1 | Detection/non-detection data (0/1) for first occasion |
o2 | Detection/non-detection data (0/1) for second occasion |
o3 | Detection/non-detection data (0/1) for third occasion |
o4 | Detection/non-detection data (0/1) for fourth occasion |
o5 | Detection/non-detection data (0/1) for fifth occasion |
year | The year the sampling was conduced in |
x | The x coordinate in UTM NAD83 |
y | The y coordinate in UTM NAD83 |
6. 'juliandays_allNY_2013-2021.csv'
This file contains the sampling dates (in ordinal format) for each sampling occasion for the winter surveys from 2013-2021. Each row is a site, each column is an occasion.
The columns in this .csv are:
Column name | Description |
---|---|
Station | Site identifier |
year | The year of sampling |
julianset | The ordinal date for the first occasion |
juliancheck1 | The ordinal date for the second occasion |
juliancheck2 | The ordinal date for the third occasion |
7. allNY_2013-2021_bobcatPArecords_pixelID_10kmgrid.csv
This file contains all the detection/non-detection data for bobcats collected during winter surveys from 2013-2021 in the southern tier of New York State. Each row is a site, each column is an occasion, each cell is a detection/non-detection record.
The columns in this .csv are:
Column name | Description |
---|---|
FID_allNY_10km | The 10km 2 grid cell ID the site was in |
sitename | The site ID |
o1 | Detection/non-detection data (0/1) for first occasion |
o2 | Detection/non-detection data (0/1) for second occasion |
o3 | Detection/non-detection data (0/1) for third occasion |
year | The year the sampling was conducted in |
x | The x coordinate in UTM NAD83 |
y | The y coordinate in UTM NAD83 |
For presence-only data which could not be hosted on Dryad due to licensing issues, please contact corresponding author (Joshua P. Twining, jpt93@cornell.edu), visit the associated github repo (https://github.com/jptwining/PO-PA-Integrated-SDM), or the New York Mammal Survey (https://www.nynhp.org/projects/statewide-mammal-survey/).
The model folder (./models)
The models are coded up via the nimble package version 0.13.1 (de Valpine et al. 2022).
1. nimble_ppp_integrated_model_bear_10km_yearindexed_randomeffects_centrered.R
This is the nimble IPPP model that is fit to the black bear data files above. The code is commented out to describe each part of the model.
2. nimble_ppp_integrated_model_coyote_10km_yearindexed_randomeffects_centered
This is the nimble IPPP model that is fit to the coyote data files above. The code is commented out to describe each part of the model.
3. nimble_ppp_integrated_model__bobcat_10km_indexingoveryear_randomeffects_centered
This is the nimble IPPP model that is fit to the bobcat data files above. The code is commented out to describe each part of the model.
The scripts folder (./scripts)
1. bear_PO_PA_model_10km_nimble_formattingandrunscript_indexoveryear
This is the formatting and run script for the bear data and model above. This code is commented throughout.
2. coyote_PO_PA_model_10km_nimble_formattingandrunscript_indexoveryear
This is the formatting and run script for the coyote data and model above. This code is commented throughout.
3. bobcat_PO_PA_model_10km_nimble_formattingandrunscript_indexoveryear
This is the formatting and run script for the bobcat data and model above. This code is commented throughout.
Methods
Landscape-scale structured camera trap surveys
Multiple camera trap surveys spanning nine years documented the occurrence of mammals across much of the state from 2013 – 2021. We conducted winter surveys of the south-central part of the state including the High Allegheny Plateau, the Western Allegheny Plateau, and the Great Lakes ecoregions (Omernik & Griffith, 2014) in 2013 (294 sites), 2014 (608 sites), and 2015 (599 sites). We conducted a second set of large-scale winter camera surveys in the same region in 2019 (584 sites), 2020 (603 sites), and 2021 (601 sites). Winter surveys of the northern part of the state (Northern Appalachians ecoregion), were conducted annually in 2016 (189 sites), 2017 (179 sites), and 2018 (191 sites). All nine winter camera trap surveys from 2013-2021 were leveraged to generate detection/non-detection data for bobcats and coyotes. We additionally conducted summer camera trap surveys for black bears in the High Allegheny Plateau, Western Allegheny Plateau, and Lower New England ecoregions in 2017 (238 sites) and 2018 (242 sites). The summer surveys were used to generate detection/non-detection data for black bears only. See Table 1 for full details on spatial scale, bait used, and sampling durations. Sites in the winter surveys were sampled using camera traps deployed opposite baited stations attached to trees. A site was defined as a 15 km2 grid cell in the winter surveys, and 25 km2 grid cells in the summer surveys, with the detector (camera traps) deployed as close to center of its respective grid cell as possible. Weekly detection records were created for each focal species over the sampling period, resulting in 3 occasions for each site-x-year combination. In the summer surveys sites were sampled using a camera trap deployed opposite to a hair snare, which consisted of 2 strands of barbed wire set at 30cm and 60cm off the ground encircling 3-6 trees. Sites were checked to replace SD cards, replenish batteries, and apply new scent and bait attractants every 2 weeks, for a total of 10 weeks. Only one detection was permitted per 2-week period minimizing violations of independence assumptions.
Presence-only background data
Additional records for the focal species were collated by accessing both online public repositories including the Global Biodiversity Information Facility (https://www.gbif.org/), iNaturalist (Research-Grade; https://www.inaturalist.org/), DataBasin (https://databasin.org/), eMammal (https://emammal.si.edu/), and Movebank ( https://www.movebank.org), as well as requesting data from governmental data stores namely NYSDEC Nuisance Wildlife Complaints. We used the search terms “bobcat”, “Lynx rufus”, “coyote”, “Canis latrans”, “black bear”, and “Ursus americanus”, and gathered all records from the sampling years (2013-2021) where specific coordinate locations were provided. For black bears, we leveraged an additional source of presence-only data, iSeeMammals, an on-going citizen project established for monitoring black bears in New York State in 2017 (Sun et al., 2021). All presence-only records were plotted and manually checked for validity. Records from iNaturalist were restricted to those labels as “research-grade”. Only records from the sampling years for each species were retained (2013-2021 for bobcat and coyote; 2017-2018 for black bear), with a maximum of one record / 10km2 pixel / week to ensure independence of observations (see spatial and observation covariates). Due to licensing issues the presence-only data from all sources apart from iSeeMammals could not be hosted by Dryad, please contact the corresponding author (Josh Twining, jpt93@cornell.edu) for data queries or see the associated github (https://github.com/jptwining/PO-PA-Integrated-SDM).
Funding
New York State Department of Environmental Conservation, Award: W-173-G