Skip to main content

Unsupervised acoustic classification of individual gibbon females and the implications for passive acoustic monitoring

Cite this dataset

Clink, Dena; Klinck, Holger (2020). Unsupervised acoustic classification of individual gibbon females and the implications for passive acoustic monitoring [Dataset]. Dryad.


1.    Passive acoustic monitoring (PAM) has the potential to greatly improve our ability to monitor cryptic yet vocal animals. Advances in automated signal detection have increased the scope of PAM, but distinguishing between individuals— which is necessary for density estimation— remains a major challenge. When individual identity is known, supervised classification techniques can be used to distinguish between individuals. Supervised methods require labeled training data, whereas unsupervised techniques do not. If the acoustic signals of individuals are sufficiently different, the number of clusters might represent the number of individuals sampled. The majority of applications of unsupervised techniques in animal vocalizations have focused on quantifying species-specific call repertoires. However, with increased interest in PAM applications, unsupervised methods that can distinguish between individuals are needed. 
2.    Here, we use an existing dataset of Bornean gibbon female calls with known identity from five sites on Malaysian Borneo to test the ability of three different unsupervised clustering algorithms (affinity propagation, K-medoids, and Gaussian mixture model-based clustering) to distinguish between individuals. Calls from different gibbon females are readily distinguishable using supervised techniques. For internal validation of unsupervised cluster solutions, we calculated silhouette coefficients. For external validation, we compared clustering results with female identity labels using a standard metric: normalized mutual information. We also calculated classification accuracy by assigning unsupervised cluster solutions to females based on which cluster had the highest number of calls from a particular female.
3.    We found that affinity propagation clustering consistently outperformed the other algorithms for all metrics used. In particular, classification accuracy of affinity propagation clustering was more consistent as the number of females increased, and when we randomly sampled females across sites. 
4.    We conclude that unsupervised techniques may be useful for providing additional information regarding individual identity for PAM applications. We stress that although we use gibbons as a case study, these methods will be applicable for any individually-distinct vocal animal.  


The dataset is comprised of 933 calls (range: 2- 46 calls per female) collected from 66 different individual Bornean gibbon (Hylobates funereus) females from five different sites in Malaysian Borneo: Maliau Basin Conservation Area, Deramakot Forest Reserve, Imbak Canyon Conservation Area, Danum Valley Conservation Area and Kalabakan Forest Reserve using a Marantz PMD 660 flash recorder (Marantz, Kawasaki, Kanagawa Prefecture, Japan) equipped with a Røde NTG-2 directional condenser microphone (Røde Microphones, Sydney, Australia). For each call we calculated MFCCs over a standardized number of windows (8) for each call, and the size of time windows we used to calculate MFCCs varied depending on the total duration of the call. For each of the 8 windows, we calculated 12 Mel-filters (or bandpass filters; Davis & Mermelstein, 1980)) between 500 and 1500 Hz, which corresponds with the frequency range of Bornean gibbon female great calls. The first MFCC for each time window corresponds to the amplitude or loudness of the signal; this will vary depending on the recording distance to the calling animal and is therefore not appropriate for discriminative tasks so we only used 11 MFCCs for each time window. MFCCs describe the spectral envelope at particular points in time, but do not capture temporal variation in the signal. Therefore, we also calculated delta-cepstral coefficients which provide a measure of change from one frame the next, and provide information about the temporal dynamics of the signal. As we estimated 11 MFCCs for each time window, we also had 11 delta coefficients. We also included duration, which resulted in a final feature vector of 177 parameters describing each call. 

Usage notes

R code and data is included as follows:
1. Clink and Klinck Part 1 Supervised Classification. R code for supervised classification.
2. Clink and Klinck Part 2 Unsupervised Bootstrapping. R code for recreating analyses and figures for bootstrapping over 100 iterations.
3. Clink and Klinck Part 3 Unsupervised Clustering by site. R code for recreating analyses and figures for site-level comparisons.
4. ClinkandKlinck2020GibbonFemaleData.csv. Female identity and MFCC feature data needed to recreate analyses.


Fulbright U.S. Student Program

Fulbright U.S. Student Program