Supplementary information for: A continuous-score occupancy modeling framework for incorporating uncertain machine learning output in autonomous biodiversity surveys
Rhinehart, Tessa; Turek, Daniel; Kitzes, Justin (2022), Supplementary information for: A continuous-score occupancy modeling framework for incorporating uncertain machine learning output in autonomous biodiversity surveys, Dryad, Dataset, https://doi.org/10.5061/dryad.ns1rn8ptd
Ecologists often study biodiversity by evaluating species occupancy and the relationship between occupancy and other covariates. Occupancy models are now widely used to account for false absences in field surveys and to reduce bias in estimates of covariate relationships. Existing occupancy models take as inputs binary detection/non-detection observations of species at each visit to each site. However, autonomous sensing devices and machine learning models are increasingly used to survey biodiversity, generating a new type of observation record (i.e., continuous-score data) that reflects the model’s confidence a species is present in each autonomously sensed file, instead of binary detection/non-detection data. These data are not directly compatible with traditional binary occupancy modeling methods.
Here, we develop a new occupancy model that models continuous scores on a visit level as a Gaussian mixture, combining a distribution of scores for files that do contain the species of interest and a distribution of scores for files that do not. The model takes as input continuous scores for each autonomously sensed and classified file, along with an optional small number of binary, manually verified detection and non-detection annotations.
We present a simulation study that shows that over a range of empirically realistic parameters, our model outperforms traditional occupancy models that are based on binary annotation alone. We also apply this new model to an empirical case study using data generated from five machine learning classifiers applied to autonomous acoustic recordings gathered in the eastern United States.
Because our occupancy model generalizes allowable input data beyond binary observations, it is particularly well-suited to the increasing volume of machine learning classified data in ecology and conservation.
This dataset contains the supplementary information for the manuscript "A continuous-score occupancy modeling framework for incorporating uncertain machine learning output in autonomous biodiversity surveys," including the Supporting Information document, scripts used to create and fit occupancy models for three experiments, weights of the trained machine learning models used in the manuscript, and scripts used to create and apply these machine learning models.