Skip to main content
Dryad

Animal acoustic identification, denoising, and source separation using generative adversarial networks

Data files

Aug 18, 2025 version files 425.99 MB

Click names to download individual files

Abstract

Soundscapes contain rich ecological information, offering insights into both biodiversity and ecosystem dynamics. However, the sheer volume of data produced by passive acoustic monitoring presents significant challenges for scalable analysis and ecological interpretation. While convolutional neural networks (CNNs) have advanced species classification in bioacoustics, they often struggle with identifying acoustic targets in acoustic space and quantifying soundscapes’ characteristics.

In this study, we propose a novel spectrogram-to-spectrogram translation framework based on generative adversarial networks (GANs) to isolate and quantify acoustic sources within soundscape recordings. Our method is trained on paired spectrogram images: original full-spectrogram representations and target spectrogram representations containing only the vocalizations of specific sound labels. This design enables the model to learn source-specific mappings and perform both the species and community-level separation of acoustic components in soundscape recordings.

We developed and evaluated two GAN-based models: a species-level GAN targeting eight avian species, and a community-level GAN distinguishing among avian, insect, and anthropogenic sound sources. The models were trained and tested using soundscape recordings collected from the Yaoluoping National Nature Reserve, eastern China. The species-level model achieved a mean F1 score of 0.76 for pixel-wise detection, while the community-level model reached 0.79 across categories. In addition to precise temporal-spectral localization, our approach captures sources’ acoustic occupancy and frequency distribution patterns, offering deeper ecological insight. Compared to baseline CNN classifiers, our model achieved a mean F1 score of 0.97, demonstrating comparable classification performance to ResNet50 (0.95) and VGG16 (0.98) across multiple species. Our GAN approach for extracting sound sources also significantly outperformed conventional methods in denoising and source separation, as indicated by lower image-level mean squared error. 

These results demonstrate the utility of GANs in advancing ecoacoustic analyses and biodiversity monitoring. By enabling robust source separation and fine-resolution signal mapping, the proposed approach contributes a scalable and transferable tool for soundscape quantification.