Skip to main content

Data from: Vocalizations in the plains zebra (Equus quagga)

Cite this dataset

Xie, Bing; Daunay, Virgile; Petersen, Troels; Briefer, Elodie (2024). Data from: Vocalizations in the plains zebra (Equus quagga) [Dataset]. Dryad.


Acoustic signals are vital in animal communication, and quantifying these signals them is fundamental for understanding animal behaviour and ecology. Vocaliszations can be classified into acoustically and functionally or contextually distinct categories, but establishing these categories can be challenging. Newly developed methods, such as machine learning, can provide solutions for classification tasks. The plains zebra is known for its loud and specific vocaliszations, yet limited knowledge exists on the structure and information content of its vocaliszations. In this study, we employed both feature-based and spectrogram-based algorithms, incorporating supervised and unsupervised machine learning methods to enhance robustness in categoriszing zebra vocaliszation types. Additionally, we implemented a permuted discriminant function analysis (pDFA) to examine the individual identity information contained in the identified vocaliszation types. The findings revealed at least four distinct vocaliszation types he ‘“snort’,” the ‘“soft snort’,” the ‘“squeal’,” and the ‘“quagga quagga’” with individual differences observed mostly in snorts, and to a lesser extent in squeals. Analyses based on acoustic features outperformed those based on spectrograms, but each excelled in characteriszing different vocaliszation types. We thus recommend the combined use of these two approaches. OuThisr study offers valuable insights into plains zebra vocaliszation, with implications for future comprehensive explorations in animal communication.

README: Vocalizations in the plains zebra (Equus quagga)

Data and Scripts

  • 1_Praat_Script_Zebra_Vocalisations.praat: This script is used to extract vocal features using the software Praat.
  • This archive contains data and scripts for analyzing the vocal repertoire. It includes two folders:
    • Feature_based_analyses:
    • The dataset "feature_based_input.csv" is the input for both scripts in this folder.
    • "feature_based_supervised_classification_xgboost.ipynb" is used for supervised analysis.
    • "feature_based_unsupervised_clustering.ipynb" is used for unsupervised analysis.
    • Spectrogram_based_analyses:
    • The "spectrogram_based_classification" folder contains the input data "calltype_spec.npz" and "calltype_y.csv", as well as the notebook script "spectrogram_based_classification_cnn.ipynb" for supervised machine learning analysis.
    • The "spectrogram_based_clustering" folder contains subfolders "audio", "data", "functions", "notebooks", and "parameters" used in unsupervised machine learning analysis, following Thomas et al. (doi:10.1111/1365-2656.13754).
  • This archive includes:
    • Data files "finaluse_input_snort_5_filter30.csv" and "finaluse_input_squeal_5_filter500.csv", which are inputs for the vocalization types "snort" and "squeal", respectively.
    • The script "vocal_individuality.Rmd".
    • The workspace "vocal_individuality.Rdata".
    • The source function "pdfa_functions.r".
    • All results from the analysis.


  • Praat is required to run 1_Praat_Script_Zebra_Vocalisations.praat.
  • Python is used to run all analyses in
  • R is required to run analyses in


Data collection and sampling

We collected data in three locations, in Denmark and South Africa: 1) 10 months between December 2020 and July 2021 and between September and December 2021, at Pilanesberg National Park (hereafter “PNP”), South Africa, covering both dry season (i.e. from May to September) and wet season (i.e. from October to April) (1); 2) 16 days between May and June 2019, and 33 days between February and May 2022, at Knuthenborg Safari Park (hereafter “KSP”), Denmark, covering both periods before the park’s opening for tourists (i.e. from November to March) and after (i.e. from April to October); 3) 4 days in August 2019 at Givskud Zoo (hereafter “GKZ”), Denmark.

For all places and periods, three types of data were collected as follows: 1) Pictures were taken for each individual from both sides using a camera (Nikon COOLPIX P950); 2) Contexts of vocal production were recorded either through notes (in the first period of KSP and in GKZ) or videos (in the second period of KSP and in PNP) filmed by a video camera recorder (Sony HDRPJ410 HD); 3) Audio recordings were collected using a directional microphone (Sennheiser MKH-70 P48, with a frequency response of 50 - 20000 Hz (+/- 2.5 dB)) linked to an audio recorder (Marantz PMD661 MKIII).

Six zebras housed in GKZ were recorded while being separated from one another into three enclosures (the stable, the small enclosure and the savannah) manually by the zookeeper for management purpose, which triggered vocalisations. These vocalisations, along with other types of data, were recorded at distances of 5 - 30 m.

In KSP, 15 - 18 zebras (population changed due to newborns, deaths, or removal of adult males) were living with other herbivores in a 0.14 km2 savannah. There, we approached the zebras by driving down the road until approximately 7 - 40 m, at which point spontaneous vocalisations and other information were collected. This distance allowed us to collect good quality recordings without eliciting any obvious reactions from the zebras to our presence.

Finally, PNP is a 580 km2 national park, with approximately 800 - 2000 zebras (2). In this park, we drove on the road and parked at distances of 10 - 80 m when encountering zebras, where all data, including spontaneous vocalisations, were recorded.

Data processing

Individual zebras were manually identified based on the pictures collected from KSP and GKZ (15-18 and 6 zebras, respectively). In PNP, the animals present in the pictures were individually identified using WildMe (, a web-based machine learning platform facilitating individual recognition. All zebra pictures were uploaded to the platform for a full comparison through the algorithm. The resulting matching candidates were then determined by manually reviewing the output.

Audio files (sampling rate: 44100 Hz) were saved at 16-bit amplitude resolution in WAV format. We annotated zebra vocalisations, along with context and individuals emitting the vocalisations, using Audacity software (version 3.3.3) (3). Vocalisations were first subjectively labelled as five vocalisation types based on both audio and spectrogram examinations (i.e. visually inspection) (Table 1 and Figure 1). Among these types, the “squeal-snort” was excluded from further analysis, as the focus of this study was on individual vocalisation types instead of combinations.

Acoustic analysis

We extracted vocalisations of good quality, defined as vocalisations with clear spectrograms, low background noise, and no overlap with other sounds, and saved them as distinct audio files. For the individual distinctiveness analysis, we excluded individuals with fewer than 5 vocalisations of each type, to avoid strong imbalance, resulting in 359 snorts from 28 individuals and 138 squeals from 14 individuals (Table S3 and S4) (4, 5). The individuality content of quagga quagga and soft snorts could not be explored, due to insufficient individual data. For vocal repertoire analysis, we excluded vocalisations longer than 1.25 s to improve spectrogram-based analysis, following Thomas et al (6). In total, we gathered 678 vocalisations for the spectrogram-based vocal repertoire analysis, including 117 quagga quagga, 204 snorts, 161 squeals and 196 soft snorts (Table S2). Among these vocalisations, six squeals were excluded in the acoustic feature-based vocal repertoire analysis, due to missing data for one of the features (amplitude modulation extent).

All calls were first high-passed filtered above 30 Hz for snorts and soft snorts, above 500 Hz for squeals and above 600 Hz for quagga quagga (i.e. above the average minimum fundamental frequency of these vocalisations; Table S1). We then extracted 12 acoustic features from vocalisations for the individual distinctiveness analysis (Table 2), using a custom script (7-10) in Praat software (11). Eight of these features were also extracted for the vocal repertoire analysis (i.e. all features except those related to the fundamental frequency, which were not available for soft snorts that are not tonal). Additionally, to explore the vocal repertoire, mel-spectrograms were generated from audio files using STFT, following Thomas et al. (6). Spectrograms were padded with zeros according to the length of the longest audio file to ensure uniform length for all audio files, and time-shift adjustments were implemented to align the starting points of vocalisations (6).

Statistical analyses

a. Vocal repertoire

We applied both supervised and unsupervised machine learning to both acoustic features and spectrogram using Python (version 3.9.7) (12).

Supervised method. To define the vocal repertoire via an acoustic feature-based approach, we deployed feature importance analysis by SHapley Additive exPlanation (SHAP) (13), using the shap library (version 0.40.0) (14). Six features with SHAP value > 1 were selected (Figure S1). We split the selected features with vocalisation type labels into a training dataset (70%) and a testing dataset (30%) using the Scikit-learn library (function: train_test_split, version 0.24.2) (15). Subsequently, we employed a supervised approach, the eXtreme Gradient Boosting (XGBoost) classifier in xgboost library (version 1.6.0) (16) to train the model. Three hyperparameters were tuned on the training dataset to reach maximum accuracy using optuna library (direction = minimize, n_trials = 200, version 2.10.0) (17), incorporating cross validation (five folds), which resulted in the best model (Table S5).

To define the vocal repertoire via a spectrogram-based approach, we split the dataset into a training set (49%), a validation set (21%), and a test dataset (30%), using the Scikit-learn library (function: train_test_split, version 0.24.2) (15). We implemented a Convolutional Neural Network (CNN) architecture using the tensorflow library (version 2.8.0) (18). The architecture was constructed (Table S6) and seven hyperparameters were tuned to reach maximum accuracy on the training and validation dataset using the optuna library (direction = minimize, n_trials = 50, version 2.10.0) (17), which resulted in the best model (Table S6).

We evaluated model performance for both feature-based and spectrogram-based classification models through predictions on each test dataset, including the test accuracy across all call types (number of correct predictions / total number of predictions), and three metrics for each call type; precision (true positives / (true positive + false positives)), recall (true positives / (true positives + false negatives) and the harmonic mean of precision and recall — f1-score (2 × (precision × recall) / (precision + recall) (19). We also plotted the confusion matrix between true classes and predicted classes.

Unsupervised method. For both acoustic feature-based and spectrogram-based analyses, we applied Uniform Manifold Approximation and Projection (UMAP) in the umap library (function: umap.UMAP, n_neighbors=200 and local_connectivity= 150 for acoustic feature-based analysis, and metric = calc_timeshift_pad and min_dist = 0 for spectrogram-based analysis, version 0.1.1) (20), to reduce variables into a 2-dimensional latent space. We also implemented k-means clustering algorithm for both analyses from the Scikit-learn library (function:, version 0.24.2) (15), to identify distinct clusters using the elbow method (21). The acoustic feature-based analysis followed the same feature importance selection result as in the feature-based supervised method (six features), while the spectrogram was analysed using scripts provided by Thomas et al. (6). We drew the 2-dimensional latent space and clusters using matplotlib library (version 3.4.3) (22). We also plotted the confusion matrix between true classes and predicted clusters using the seaborn library (version 0.11.2) (23). Finally, we plotted the pairwise distances within a vocalisation type against between vocalisation types using the script provided by Thomas et al. (6).

b. Vocal Individuality

We assessed the individual distinctiveness of vocalisation types using R studio (version 2022.02.1 with R version 4.2.2) (24, 25).

We performed a Kaiser–Meyer–Olkin test on the 12 acoustic features to measure suitability of those features for factor analysis, using the psych package (KMO function, version 2.4.2 (26)). Variables with MSA (Measure of Sampling Adequacy) equal to or greater than 0.5 (Table S7) (27) were selected, and subsequently input into a principal component analysis (PCA) using the stats package (prcomp function, version 4.2.2), to reduce correlation and multicollinearity (28). PC loadings with eigenvalues > 1 (Table S9) were then first input into a discriminant function analysis (DFA) with individual identity as the grouping factor, using the MASS package (Ida function, version 7.3-58.2) (29), to visualise the feature (PC) loadings responsible for individuality. They were then additionally input into a permuted discriminant function analysis (pDFA), to assess individual distinctiveness using functions developed by R. Mundry (30), which are based on MASS package (29). We ran a first nested pDFA with sex as a restriction factor, and a second nested pDFA with location as a restriction factor (30). Both pDFAs included individual identity as the test factor.


1.    Shannon G, Page BR, Duffy KJ, Slotow R. The ranging behaviour of a large sexually dimorphic herbivore in response to seasonal and annual environmental variation. Austral Ecology. 2010;35(7):731-42. (

2.    Van Dyk G, Slotow R. The effects of fences and lions on the ecology of African wild dogs reintroduced to Pilanesberg National Park, South Africa. African Zoology. 2003;38(1):79-94. (

3.    Audacity® software is copyright © 1999-2021 Audacity Team. Web site: It is free software distributed under the terms of the GNU General Public License. The name Audacity® is a registered trademark.)

4.    Bertucci F, Attia J, Beauchaud M, Mathevon N. Sounds produced by the cichlid fish Metriaclima zebra allow reliable estimation of size and provide information on individual identity. Journal of Fish Biology. 2012;80(4):752-66. (

5.    Linn SN, Schmidt S, Scheumann M. Individual distinctiveness across call types of the southern white rhinoceros (Ceratotherium simum simum). Journal of Mammalogy. 2021;102(2):440-56. (

6.    Thomas M, Jensen FH, Averly B, Demartsev V, Manser MB, Sainburg T, et al. A practical guide for generating unsupervised, spectrogram‐based latent space representations of animal vocalizations. Journal of Animal Ecology. 2022;91(8):1567-81. (

7.    Reby D, McComb K. Anatomical constraints generate honesty: acoustic cues to age and weight in the roars of red deer stags. Animal Behaviour. 2003;65(3):519-30. (

8.    Briefer EF, Vizier E, Gygax L, Hillmann E. Expression of emotional valence in pig closed-mouth grunts: Involvement of both source-and filter-related parameters. The Journal of the Acoustical Society of America. 2019;145(5):2895-908. (

9.    Garcia M, Gingras B, Bowling DL, Herbst CT, Boeckle M, Locatelli Y, et al. Structural classification of wild boar (Sus scrofa) vocalizations. Ethology. 2016;122(4):329-42. (

10.  Xie B, Daunay V, Petersen TC, Briefer EF. Data, scripts and supplemental information. 2024.)

11.  Boersma P. Praat, a system for doing phonetics by computer. Glot International. 2002;5:9/10:341-5.)

12.  Van Rossum G, Drake FL. Introduction to python 3: python documentation manual CreateSpace; 2009.)

13.  Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. arXiv preprint. 2017;30:arXiv:1705.07874. (

14.  Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence. 2020;2(1):56-67. (

15.  Pedregosa F. Scikit‐learn: Machine learning in python. Journal of Machine Learning Research. 2011;12:2825.)

16.  Chen T, Guestrin C, editors. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining; 2016. (

17.  Akiba T, Sano S, Yanase T, Ohta T, Koyama M, editors. Optuna: A next-generation hyperparameter optimization framework. Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining; 2019. (

18.  Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint. 2016:arXiv:1603.04467. (

19.  Goutte C, Gaussier E, editors. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. European conference on information retrieval; 2005: Springer. (

20.  McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint. 2018:arXiv:1802.03426. (

21.  Syakur MA, Khotimah BK, Rochman EMS, Satoto BD, editors. Integration k-means clustering method and elbow method for identification of the best customer profile cluster. IOP conference series: materials science and engineering; 2018: IOP Publishing. (

22.  Hunter JD. Matplotlib: A 2D graphics environment. Computing in Science & Engineering. 2007;9(03):90-5. (

23.  Waskom ML. Seaborn: statistical data visualization. Journal of Open Source Software. 2021;6(60):3021. (

24.  R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL 2022.)

25.  Rstudio Team. RStudio: Integrated Development Environment for R. RStudio, PBC, Boston, MA URL 2022.)

26.  Revelle W. Package ‘psych’. The comprehensive R archive network. 2015. p. 238-9.)

27.  Kaiser HF. An index of factorial simplicity. psychometrika. 1974;39(1):31-6. (

28.  Jolliffe IT. Principal component analysis: Springer; 2002. (

29.  Venables WN, Ripley BD. Modern Applied Statistics with S: Springer; 2002.)

30.  Mundry R, Sommer C. Discriminant function analysis with nonindependent data: consequences and an alternative. Animal Behaviour. 2007;74(4):965-76. (


Carlsberg Foundation, Award: CF19-0604

Carlsberg Foundation, Award: CF20-0538

Chinese Scholarship Council , Award: 201906040228