Integrating presence-only and detection/non-detection data to estimate distributions and expected abundance of difficult-to-monitor species on a landscape-scale

Twining, Joshua 1 ; Fuller, Angela2; Sun, Catherine3; Calderón-Acevedo, Camilo4; Schlesinger, Matthew4; Berger, Melanie4; Kramer, David5; Frair, Jacqueline4

Published Mar 08, 2024 on Dryad. https://doi.org/10.5061/dryad.ghx3ffbwf

Data files

Mar 08, 2024 version files 2.45 MB

allNY_10kmgrid_alllandscapecovs_final_dryad.csv

1.94 MB
allNY_2013-2021_bobcatPArecords_pixelID_10kmgrid.csv

148.07 KB
allNY_2013-2021_coyotePArecords_pixelID_10kmgrid.csv

148.12 KB
allNY_2016_2018_bearPOrecord_pixelID_10kmgrid_noweeklydups_final.csv

56.93 KB
allNY_2016-2018_bearPAdata_pixelID_10kmgrid_final.csv

29.57 KB
juliandays_allNY_2013-2021.csv

79.86 KB
NY_blackbears_ordinaldates.csv

32.71 KB
README.md

13.24 KB

Abstract

Estimating species distribution and abundance is foundational to effective management and conservation. Using an integrated species distribution model that combines presence-only data from various sources with detection/non-detection data from structured surveys, we estimated the distribution and expected abundance of difficult-to-monitor mammals of management concern across New York State, namely, coyotes, bobcats, and black bears. Three distinct landscape-scale camera trap surveys provided detection/non-detection data over nine years between 2013-2021, and we augmented those data with incidental records of our focal species from public repositories. We used an inhomogeneous Poisson point process to construct an integrated model that fit both data types simultaneously. We demonstrate a simple application of spatial point density of all species records in the accessed public databases to inform the thinning process to account for unknown spatial sampling in the presence-only data, often referred to as the “magic covariate”. Using this approach, we examine habitat associations and provide spatially explicit estimates of expected abundance across the entirety of New York state for all three focal species.

As expected, coyotes were the most widely distributed and abundant species, with a strong positive association with agricultural land uses. Bobcats exhibited low expected abundance throughout the state and showed positive associations with deciduous forest and forest edge, and a negative association with road density. Finally, we observed considerable spatial variation in abundance of black bears with expected numbers increasing in association with various forest cover and composition covariates and decreasing with crop cover. We present insights into habitat associations and provide management implications for each of the species of interest.

Our integrated modelling method allows for managers to use citizen sightings combined with detection/non-detection surveys to estimate robust indices of abundance for both high- and low-density, and wide-spread versus patchily distributed species. Through comparison with previous studies, we highlight how broad-scale programs, such as the statewide efforts to estimate species distributions undertaken here, can benefit substantively from integrated models that leverage additional data (here, incidental records) from a larger region of space, and thus capture more landscape heterogeneity than is plausible within formalized surveys.

https://doi.org/10.5061/dryad.ghx3ffbwf

Summary

These are the data, MCMC samplers, and processing and run scripts for an inhomogeneous Poisson point process model to integrate detection/non-detection data and presence-only data to estimate the expected abundance of species. This model enables the user to integrate these two datatypes by assuming they share the same underlying data-generating process (an inhomogeneous Poisson point process). These models are adapted from Koshkina et al. (2017) [Methods in Ecology and Evolution] but diverge from the original specification to accommodate t primary sampling periods (survey years). This model fits temporally varying mean-centered random year effects of all three parameters in the model (detection probability, p, mean expected abundance, λ, and thinning, b).

The working directory

Below you will find descriptions of each folder in this repository and files contained within them.

The data folder (./data)

This folder has seven files. NAs in all datasets below refer to missing data (NA - not available).

1. allNY_10kmgrid_alllandscapecovs_final.csv

This file contains all of the summarized spatial covariate data used in the analysis. Each row is a different 10km 2 pixel in New York State, each column is a covariate, and each cell is a value.

The covariates used in the analysis.

Covariate	Description	Source
plot_id	Site identifier	User generated (sequence 1-14008)
x	The x coordinate of the site in UTM NAD83	Universal Transverse Mercator
y	The y coordinate of the site in UTM NAD83	Universal Transverse Mercator
Deciduous	Proportion of a 10 km2 grid cell made up of deciduous forest	NLCD, 2019 (https://www.mrlc.gov/data/nlcd-2019-land-cover-conus)
Coniferous	Proportion of a 10 km2 grid cell made up of coniferous forest	NLCD, 2019 (https://www.mrlc.gov/data/nlcd-2019-land-cover-conus)
Mixed	Proportion of a 10 km2 grid cell made up of mixed forest	NLCD, 2019 (https://www.mrlc.gov/data/nlcd-2019-land-cover-conus)
Pasture	Proportion of a 10 km2 grid cell made up of pasture	NLCD, 2019 (https://www.mrlc.gov/data/nlcd-2019-land-cover-conus)
Cultivated.Crops	Proportion of a 10 km2 grid cell made up of cultivated crops	NLCD, 2019 (https://www.mrlc.gov/data/nlcd-2019-land-cover-conus)
road_density	Mean number of km of road per km 2 in each grid cell	calculated from primary and secondary roads raster provided by the NYSDEC
elevation	Mean elevation (m) of the 10 km2 grid cell	calculated from Digital Elevation Models of New York State provided by the NYSDEC
forest_edge	Edge density of combined class of all forest	NLCD, 2019 (https://www.mrlc.gov/data/nlcd-2019-land-cover-conus)
PO_join_count_truncated	density of presence-only points (all mammals from online repositories from 2013-2021)	calculated from all online repositories, hosted on github (/data/Allcompiled_POdata_NYMS_noduplicates)

2. allNY_2016-2018_bearPAdata_pixelID_10kmgrid_final.csv

This file contains all the detection/non-detection data for black bears collected during summer surveys from 2016-2018 in the southern tier of New York State. Each row is a site, each column is an occasion, each cell is a detection/non-detection record.

The columns in this csv are:

Column	Description
FID_allNY_10km	The 10km 2 grid cell ID the site was in
Station	The site ID
o1	Detection/non-detection data (0/1) for first occasion
o2	Detection/non-detection data (0/1) for second occasion
o3	Detection/non-detection data (0/1) for third occasion
o4	Detection/non-detection data (0/1) for fourth occasion
o5	Detection/non-detection data (0/1) for fifth occasion
year	The year the sampling was conduced in
x	The x coordinate in UTM NAD83
y	The y coordinate in UTM NAD83

3. allNY_2016_2018_bearPOrecord_pixelID_10kmgrid_noweeklydups_final.csv

This file contains all the presence-only data used in this analysis for black bears (maximum 1 per 10 km2 pixel per week) between the years 2016-2018 both from public online repositories and iSeeMammals, a citizen science bear monitoring project run by Cornell University.

The columns in this csv are:

Column name	Description
FID_allNY_10km	The 10km 2 grid cell ID the site was in
latitude	the latitude of the detection (WGS 84)
longitude	the longitude of the detection (WGS 84)
Month	The month of the year of the detection
Day	The day of the year of the detection
Year	The year the sampling was conducted in
Date	The date of the detection in MM/DD/YYYY format
Time	The time of the detection in HH:MM:SS format

4. NY_blackbears_ordinaldates.csv

This file contains the sampling dates (in ordinal format) for each sampling occasion from the summer surveys from 2016-2018. Each row is a site, each column is an occasion.

The columns in this csv are:

Column name	Description
Station	Site identifier
year	The year of sampling
utm_x	The x coordinate of the site in UTM NAD 83
utm_y	The y coordinate of the site in UTM NAD 83
date1	The ordinal date for the first occasion
date2	The ordinal date for the second occasion
date3	The ordinal date for the third occasion
date4	The ordinal date for the fourth occasion
date5	The ordinal date for the fifth occasion

5. allNY_2013-2021_coyotePArecords_pixelID_10kmgrid.csv

This file contains all the detection/non-detection data for coyotes collected during winter surveys from 2013-2021 in the southern tier of New York State. Each row is a site, each column is an occasion, each cell is a detection/non-detection record.

The columns in this .csv are:

Column	Description
FID_allNY_10km	The 10km 2 grid cell ID the site was in
Station	The site ID
o1	Detection/non-detection data (0/1) for first occasion
o2	Detection/non-detection data (0/1) for second occasion
o3	Detection/non-detection data (0/1) for third occasion
o4	Detection/non-detection data (0/1) for fourth occasion
o5	Detection/non-detection data (0/1) for fifth occasion
year	The year the sampling was conduced in
x	The x coordinate in UTM NAD83
y	The y coordinate in UTM NAD83

6. 'juliandays_allNY_2013-2021.csv'

This file contains the sampling dates (in ordinal format) for each sampling occasion for the winter surveys from 2013-2021. Each row is a site, each column is an occasion.

The columns in this .csv are:

Column name	Description
Station	Site identifier
year	The year of sampling
julianset	The ordinal date for the first occasion
juliancheck1	The ordinal date for the second occasion
juliancheck2	The ordinal date for the third occasion

7. allNY_2013-2021_bobcatPArecords_pixelID_10kmgrid.csv

This file contains all the detection/non-detection data for bobcats collected during winter surveys from 2013-2021 in the southern tier of New York State. Each row is a site, each column is an occasion, each cell is a detection/non-detection record.

The columns in this .csv are:

Column name	Description
FID_allNY_10km	The 10km 2 grid cell ID the site was in
sitename	The site ID
o1	Detection/non-detection data (0/1) for first occasion
o2	Detection/non-detection data (0/1) for second occasion
o3	Detection/non-detection data (0/1) for third occasion
year	The year the sampling was conducted in
x	The x coordinate in UTM NAD83
y	The y coordinate in UTM NAD83

For presence-only data which could not be hosted on Dryad due to licensing issues, please contact corresponding author (Joshua P. Twining, jpt93@cornell.edu), visit the associated github repo (https://github.com/jptwining/PO-PA-Integrated-SDM), or the New York Mammal Survey (https://www.nynhp.org/projects/statewide-mammal-survey/).

The model folder (./models)

The models are coded up via the nimble package version 0.13.1 (de Valpine et al. 2022).

1. nimble_ppp_integrated_model_bear_10km_yearindexed_randomeffects_centrered.R

This is the nimble IPPP model that is fit to the black bear data files above. The code is commented out to describe each part of the model.

2. nimble_ppp_integrated_model_coyote_10km_yearindexed_randomeffects_centered

This is the nimble IPPP model that is fit to the coyote data files above. The code is commented out to describe each part of the model.

3. nimble_ppp_integrated_model__bobcat_10km_indexingoveryear_randomeffects_centered

This is the nimble IPPP model that is fit to the bobcat data files above. The code is commented out to describe each part of the model.

The scripts folder (./scripts)

1. bear_PO_PA_model_10km_nimble_formattingandrunscript_indexoveryear

This is the formatting and run script for the bear data and model above. This code is commented throughout.

2. coyote_PO_PA_model_10km_nimble_formattingandrunscript_indexoveryear

This is the formatting and run script for the coyote data and model above. This code is commented throughout.

3. bobcat_PO_PA_model_10km_nimble_formattingandrunscript_indexoveryear

This is the formatting and run script for the bobcat data and model above. This code is commented throughout.

Landscape-scale structured camera trap surveys

Multiple camera trap surveys spanning nine years documented the occurrence of mammals across much of the state from 2013 – 2021. We conducted winter surveys of the south-central part of the state including the High Allegheny Plateau, the Western Allegheny Plateau, and the Great Lakes ecoregions (Omernik & Griffith, 2014) in 2013 (294 sites), 2014 (608 sites), and 2015 (599 sites). We conducted a second set of large-scale winter camera surveys in the same region in 2019 (584 sites), 2020 (603 sites), and 2021 (601 sites). Winter surveys of the northern part of the state (Northern Appalachians ecoregion), were conducted annually in 2016 (189 sites), 2017 (179 sites), and 2018 (191 sites). All nine winter camera trap surveys from 2013-2021 were leveraged to generate detection/non-detection data for bobcats and coyotes. We additionally conducted summer camera trap surveys for black bears in the High Allegheny Plateau, Western Allegheny Plateau, and Lower New England ecoregions in 2017 (238 sites) and 2018 (242 sites). The summer surveys were used to generate detection/non-detection data for black bears only. See Table 1 for full details on spatial scale, bait used, and sampling durations. Sites in the winter surveys were sampled using camera traps deployed opposite baited stations attached to trees. A site was defined as a 15 km² grid cell in the winter surveys, and 25 km² grid cells in the summer surveys, with the detector (camera traps) deployed as close to center of its respective grid cell as possible. Weekly detection records were created for each focal species over the sampling period, resulting in 3 occasions for each site-x-year combination. In the summer surveys sites were sampled using a camera trap deployed opposite to a hair snare, which consisted of 2 strands of barbed wire set at 30cm and 60cm off the ground encircling 3-6 trees. Sites were checked to replace SD cards, replenish batteries, and apply new scent and bait attractants every 2 weeks, for a total of 10 weeks. Only one detection was permitted per 2-week period minimizing violations of independence assumptions.

Presence-only background data

Additional records for the focal species were collated by accessing both online public repositories including the Global Biodiversity Information Facility (https://www.gbif.org/), iNaturalist (Research-Grade; https://www.inaturalist.org/), DataBasin (https://databasin.org/), eMammal (https://emammal.si.edu/), and Movebank ( https://www.movebank.org), as well as requesting data from governmental data stores namely NYSDEC Nuisance Wildlife Complaints. We used the search terms “bobcat”, “Lynx rufus”, “coyote”, “Canis latrans”, “black bear”, and “Ursus americanus”, and gathered all records from the sampling years (2013-2021) where specific coordinate locations were provided. For black bears, we leveraged an additional source of presence-only data, iSeeMammals, an on-going citizen project established for monitoring black bears in New York State in 2017 (Sun et al., 2021). All presence-only records were plotted and manually checked for validity. Records from iNaturalist were restricted to those labels as “research-grade”. Only records from the sampling years for each species were retained (2013-2021 for bobcat and coyote; 2017-2018 for black bear), with a maximum of one record / 10km² pixel / week to ensure independence of observations (see spatial and observation covariates). Due to licensing issues the presence-only data from all sources apart from iSeeMammals could not be hosted by Dryad, please contact the corresponding author (Josh Twining, jpt93@cornell.edu) for data queries or see the associated github (https://github.com/jptwining/PO-PA-Integrated-SDM).

Integrating presence-only and detection/non-detection data to estimate distributions and expected abundance of difficult-to-monitor species on a landscape-scale

Data files

Abstract

README: Integrating presence-only and detection/non-detection data to estimate distributions and expected abundance of difficult-to-monitor species on a landscape-scale.

Summary

The working directory

The data folder (./data)

The model folder (./models)

The scripts folder (./scripts)

Methods

Works referencing this dataset