Supplementary information for: A continuous-score occupancy modeling framework for incorporating uncertain machine learning output in autonomous biodiversity surveys

Rhinehart, Tessa 1 ; Turek, Daniel2 ; Kitzes, Justin 1

Published May 18, 2022; Updated May 19, 2022 on Dryad. https://doi.org/10.5061/dryad.ns1rn8ptd

Data files

May 18, 2022 version files 224.24 MB

Abstract

Note: for occupancy model implementations and other R and Python scripts, please see the "Software" DOI hosted by Zenodo: https://doi.org/10.5281/zenodo.6353948

Ecologists often study biodiversity by evaluating species occupancy and the relationship between occupancy and other covariates. Occupancy models are now widely used to account for false absences in field surveys and to reduce bias in estimates of covariate relationships. Existing occupancy models take as inputs binary detection/non-detection observations of species at each visit to each site. However, autonomous sensing devices and machine learning models are increasingly used to survey biodiversity, generating a new type of observation record (i.e., continuous-score data) that reflects the model’s confidence a species is present in each autonomously sensed file, instead of binary detection/non-detection data. These data are not directly compatible with traditional binary occupancy modeling methods.

Here, we develop a new occupancy model that models continuous scores on a visit level as a Gaussian mixture, combining a distribution of scores for files that do contain the species of interest and a distribution of scores for files that do not. The model takes as input continuous scores for each autonomously sensed and classified file, along with an optional small number of binary, manually verified detection and non-detection annotations.

We present a simulation study that shows that over a range of empirically realistic parameters, our model outperforms traditional occupancy models that are based on binary annotation alone. We also apply this new model to an empirical case study using data generated from five machine learning classifiers applied to autonomous acoustic recordings gathered in the eastern United States.

Because our occupancy model generalizes allowable input data beyond binary observations, it is particularly well-suited to the increasing volume of machine learning classified data in ecology and conservation.

Supplementary information for: A continuous-score occupancy modeling framework for incorporating uncertain machine learning output in autonomous biodiversity surveys

Data files

Abstract

Usage notes

Works referencing this dataset