Skip to main content

Automated classification of avian vocal activity using acoustic indices in regional and heterogeneous datasets

Cite this dataset

Yip, Daniel; Mahon, Lisa; MacPhail, Alex; Bayne, Erin (2020). Automated classification of avian vocal activity using acoustic indices in regional and heterogeneous datasets [Dataset]. Dryad.


Acoustic indices combined with clustering and classification approaches have been increasingly used to automate identification of the presence of vocalizing taxa or acoustic events of interest. While most studies using this approach standardize data collection and study design parameters at the project or study level, recent trends in ecological research are to investigate patterns at regional or continental scales. Large-scale studies often require collaboration between research groups and integration of data from multiple sources to fulfill objectives, which can lead to variation in recording equipment and data collection protocols.

Our objectives were to determine how analytical approaches and variation in data collection and processing that is typical of regional acoustic monitoring programs influences accuracy when identifying vocal activity in migratory breeding birds. We used data from three regional datasets in Northern Alberta, Northern British Columbia, and Southern and Central Yukon, Canada to investigate the effect of analytical framework, sample size, local species richness, and data collection variables on classification accuracy.

We found supervised classification approaches to be the most effective, with boosted regression trees identifying vocal activity with a 92.0% accuracy and easily able to accommodate variation in data collection and processing parameters. We also provide recommendations on effectively processing large and heterogeneous datasets including sufficient sample size, accommodating nuisance variables, and selecting suitable model training data.

The results presented in this study can help inform decisions in data collection, data processing, and study design and analysis, maximize performance and accuracy during analysis, and efficiently process large, heterogeneous datasets to answer questions at scales previously difficult to investigate.


This dataset was generated using the audio analysis software by the QUT Ecoacoustics Research Group ( to generate a suite of acoustic indices for audio recordings of songbird surveys from various locations across the western Canadian boreal forest. 


Environment and Climate Change Canada