Skip to main content
Dryad

Data and code to run birdnet-discovery, a pipeline for signal discovery and training dataset creation using BirdNET embeddings, including example data from acoustic ARUs in Northern Alaska

Data files

Mar 24, 2025 version files 9.20 GB

Abstract

In recent years, deep learning has become a popular solution for processing large ecological monitoring datasets. This rise in use has resulted in global classification models for a variety of data types and taxa, such as BirdNET, which classifies vocalizations of more than 6,000 avian species from acoustic data. These global models can be useful pre-trained models for transfer learning, allowing researchers to more easily develop classifiers specialized to their datasets. However, the development of such models hinges on the availability of comprehensive, high-quality training data, which can be difficult to acquire, produce, and use. We present a novel pipeline for creating training data from a large and unlabeled dataset with minimal human oversight. We used this pipeline and BirdNET as our base model to develop a transfer-learning-based model, ArcticSoundsNET, using acoustic monitoring data from 205 sites across Alaska’s Arctic Coastal Plain. We compared performance of ArcticSoundsNET with that of BirdNET to evaluate the effectiveness of our pipeline and success of the new model. We found that the ability of ArcticSoundsNET to detect and classify avian vocalizations in our data exceeded that of BirdNET by several orders of magnitude (AUC = 0.299 for ArcticSoundsNET, AUC = <0.001 for BirdNET). Importantly, our method for developing a training dataset is widely applicable for ecologists who do not have large amounts of labeled data, facilitating the creation of task-specific classification models. Developing such models is an essential step in using large acoustic datasets to support ecological conservation of critical species and habitats.