Skip to main content
Dryad logo

Validation prediction: a flexible protocol to increase efficiency of automated acoustic processing for wildlife research


Knight, Elly; Sólymos, Peter; Scott, Chris; Bayne, Erin (2020), Validation prediction: a flexible protocol to increase efficiency of automated acoustic processing for wildlife research, Dryad, Dataset,


Automated recognition is increasingly used to extract species detections from audio recordings; however, the time required to manually review each detection can be prohibitive. We developed a flexible protocol called ‘validation prediction’ that uses machine learning to predict whether recognizer detections are true or false positives and can be applied to any recognizer type, ecological application, or analytical approach. Validation prediction uses a predictable relationship between recognizer score and the energy of an acoustic signal but can also incorporate any other ecological or spectral predictors (e.g., time of day, dominant frequency) that will help separate true from false positive recognizer detections. First, we documented the relationship between recognizer score and the energy of an acoustic signal for two different recognizer algorithm types (hidden Markov models and convolutional neural networks). Next, we demonstrated our protocol using a case study of two species, the common nighthawk (Chordeiles minor) and ovenbird (Seiurus aurocapilla). We reduced the number of detections that required validation by 75.7% and 42.9%, respectively, while retaining at least 98% of the true positive detections. Validation prediction substantially improves the efficiency of using automated recognition on acoustic datasets. Our method can be of use to wildlife monitoring and research programs and will facilitate using automated recognition to mine bioacoustic datasets.


We used three recognizers to scan acoustic recordings for a case study of validation prediction, a method that increases the efficiency of automated acoustic processing by using machine learning to predict which recognizer results are true positives. The recognizers were built using two approaches and for two different species. The first recognizer was built in Song Scope software (Version 4.1.3, Wildlife Acoustics, Concord, MA) for the common nighthawk (Chordeiles minor).  The second recognizer was built using a convolutional neural network (CNN), also for the common nighthawk. The third recognizer was also built in Song Scope software for the ovenbird (Seiurus aurocapilla).

The acoustic recordings were collected using SM2+ ARUs (Wildlife Acoustics, Concord, MA) in the boreal forest of northern Alberta, Canada (Charchuk and Bayne 2018). At each of 90 study plots, we collected one three-minute recording every hour for three days between June 1 and July 15, 2015. We scanned the full dataset of recordings with our common nighthawk and ovenbird Song Scope recognizers using score thresholds of 60 and 50, respectively. We randomly selected and scanned 10% of the training recordings with our CNN recognizer. We did not scan the full dataset with the CNN recognizer because we were only interested in using it to test the generalizability of the score and RSL relationship and did not use it in the subsequent case study application of our protocol. The first author manually validated all detections from all three recognizers to classify each detection as a true or false positive of the target species.

Usage Notes

The common nighthawk Song Scope dataset can be used to run the example script provided with the manuscript (Appendix S3). The other two datasets can be used to replicate the results of the rest of the manuscript.

Dataset files are named as follows: AcousticDataset_RecognizerName_RecognizerSettings_"results"_"validated".csv


Alberta Biodiversity Monitoring Institute

Alberta Conservation Association, Award: 030-00-90-273

Canadian Oilsands Innovation Alliance

Natural Sciences and Engineering Research Council of Canada, Award: 44660-12