Bayesian species distribution models integrate presence-only and presence-absence data to predict deer distribution and relative abundance
Morera-Pujol, Virginia et al. (2022), Bayesian species distribution models integrate presence-only and presence-absence data to predict deer distribution and relative abundance, Dryad, Dataset, https://doi.org/10.5061/dryad.5mkkwh795
Using geospatial data of wildlife presence to predict a species distribution across a geographic area is among the most common tools in management and conservation. The collection of high-quality presence-absence data through structured surveys is, however, expensive, and managers usually have access to larger amounts of low-quality presence-only data collected by citizen scientists, opportunistic observations, and culling returns for game species. Integrated Species Distribution Models (ISDMs) have been developed to make the most of the data available by combining the higher-quality, but usually scarcer and more spatially restricted presence-absence data, with the lower quality, unstructured, but usually more extensive presence-only datasets. Joint-likelihood ISDMs can be run in a Bayesian context using INLA (Integrated Nested Laplace Approximation) methods that allow the addition of a spatially structured random effect to account for data spatial autocorrelation. Here, we apply this innovative approach to fit ISDMs to empirical data, using presence-absence and presence-only data for the three prevalent deer species in Ireland: red, fallow and sika deer. We collated all deer data available for the past 15 years and fitted models predicting distribution and relative abundance at a 25 km2 resolution across the island. Models’ predictions were associated to spatial estimates of uncertainty, allowing us to assess the quality of the model and the effect that data scarcity has on the certainty of predictions. Furthermore, we checked the performance of the three species-specific models using two datasets, independent deer hunting returns and deer densities based on faecal pellet counts. Our work clearly demonstrates the applicability of spatially-explicit ISDMs to empirical data in a Bayesian context, providing a blueprint for managers to exploit unexplored and seemingly unusable data that can, when modelled with the proper tools, serve to inform management and conservation policies.
Presence absence (PA) data
PA data for each species were obtained from Coillte based on surveys performed in a fraction of the 6,000 properties they manage (Table 1) by asking property managers (who visit the forests they manage on a regular basis) whether deer were present and, if so, what species. Properties range in size from less than one to around 2,900 ha, and to assign the PA value to a specific location, we calculated the centroid of each property using the function st_centroid() from the package sf in R (Pebesma 2018). The survey was mainly performed in 2010 and 2013, in addition to further data collected between 2014 and 2016. Some properties were surveyed only once in the period 2010–2016, but for those that were surveyed more than once, the value for that location was considered “absence” if deer had never been detected in the property in any of the surveys, and “presence” in all other cases. In addition to these surveys, Coillte commissioned density surveys based on faecal pellet sampling in a subset of their properties between the years 2007 and 2020. Any non-zero densities in these data were considered “presences”, and all zeros were considered “absences”. These data were also summarised across years when a property had been repeatedly sampled and counted as presence if deer had been detected in any of the sampling years.
PA data for NI were obtained from a survey carried out by the British Deer Society in 2016. The survey divided the British territory into 100 km2 grid cells, and deer presence or absence was assessed based on public contributions, which were then reviewed and collated by BDS experts. Since 100 km2 grid cells are quite large, we did not, as with the Coillte properties, calculate the centroid of each cell and assign the PA value of the cell to it. Instead, we randomly simulated positions within each cell and assigned the presence or absence value of the cell to each of them. We performed a sensitivity analysis to calculate an optimal number of positions that would capture the environmental variability within each cell, which was set to 5 random positions per grid cell. After processing, we obtained a total of 920 PA data across NI.
2.2.2 Presence-only (PO) data
PO data were collected from various sources, mainly (but not only) from citizen science initiatives. The National Biodiversity Data Centre (NBDC) is an Irish initiative that collates biodiversity data coming from different sources, from published studies to citizen contributions. From this repository, we obtained all contributions on the three species, a total of 1,430 records. To this, we added the 164 records of deer in Ireland downloaded from the iNaturalist site, another citizen-contributed database that collects the same type of data. From the resulting dataset, we (1) removed all observations with a spatial resolution lower than 1 km2; (2) did a visual inspection of the data and comments and removed all observations that were obviously incorrect (i.e. at sea or that the comment specified it was a different species); (3) filtered out all the fallow deer reported in Dublin’s enclosed city park (Phoenix Park) since the population there was introduced and is artificially maintained and disconnected from the rest of populations in Ireland; and (4) filtered duplicate observations by retaining only one observation per user, location, and day. The Centre for Environmental Data and Recording (CEDaR) is a data repository for Northern Ireland (NI) that operates in the same way as the NBDC. They provided 872 records of deer in NI, coming from different survey, scientific, and citizen science initiatives, from which we removed all records provided with a spatial resolution lower than 1 km2. The location and species of 469 deer culled between 2019 and 2021 in NI were obtained from the British Agri-Food and Biosciences Institute. For the observations that did not have specific coordinates, we derived them from the location name or postcode if provided. As part of a nationally funded initiative to improve deer monitoring in Ireland (SMARTDEER), we developed a bespoke online tool to facilitate the reporting of deer observations by the general public and all relevant stakeholders e.g. hunters, farmers, or foresters. Observations were reported in 2021 and 2022 by clicking on a map to indicate a 1 km2 area where deer have been observed. For each user and session, we calculated the area of the surface covered in squares, simulated a number of positions proportional to the size of the polygon, and distributed them within it to generate a number of exact positions equivalent to the area where the user had indicated an observation. In total, the SMARTDEER tool allowed us to collect 4,078 presences across Ireland and NI.
2.3.2 Covariate selection
Raster environmental covariates used in the models were obtained from the Copernicus Land Monitoring Service (© European Union, Copernicus Land Monitoring Service 2018, European Environment Agency EEA), whereas the vector layers (roads, paths) were obtained from the Open Street Map service (OpenStreetMap contributors, 2017. Planet dump [Data file from January 2022]. https://planet.openstreetmap.org). Vector layers were transformed into distance layers (distance to roads, distance to paths) using the distance() function from the package raster, and into density layers (density of roads, paths) using the rasterize() function of the same package (Hijmans 2021). All raster layers were resampled to the lowest resolution available in the used covariates, resulting in a 1 km2 resolution. A full description of the process of covariate selection (including screening for collinearity) can be found in the supplementary material. The covariates eventually used in the model were elevation (m), slope (degrees), tree cover (%), small woody feature density (%), distances to forest edge (m, positive distances indicate a location outside a forest, negative distances indicate a location within a forest), and human footprint index (Venter et al. 2016, 2018). All covariates were scaled by subtracting the mean and dividing by the standard deviation before entering the model (function scale() from the raster package).
Presence absence and presence only data is provided as a .csv file and can be opened in most spreadsheet softwares, either proprietary or open access.
Covariate data is provided as as a grd file. Dowloading both the .grd and the .gri files, they can then be loaded in any geostatistical software such as ArcGIS (proprietary) or QGIS (open access) or in R (open access).
Department of Agriculture, Food and the Marine, Ireland, Award: 2019R417