Data from: Female African elephant rumbles differ between populations and sympatric social groups

Pardo, Michael 1 ; Lolchuragi, David2; Poole, Joyce3; Granli, Petter3; Moss, Cynthia4; Douglas-Hamilton, Iain2; Wittemyer, George1

Published Sep 21, 2023 on Dryad. https://doi.org/10.5061/dryad.rv15dv4dt

Data files

Sep 21, 2023 version files 120.95 MB

20230911_MasterList_dur90_aenv_pmelspec_certID1_Samb_Ambo_n1918_dates_fixed.rds

117.65 MB
20230919_dfall_seltabs_and_acoustic_features_for_upload.csv

2.73 MB
20230919_seltabs_for_upload_group_signature_paper.csv

544.22 KB
20230919_variable_definitions_group_signature_paper.xlsx

12.72 KB
README.md

8.62 KB

Sep 21, 2023 version files 120.95 MB

20230911_MasterList_dur90_aenv_pmelspec_certID1_Samb_Ambo_n1918_dates_fixed.rds

117.65 MB
20230919_dfall_seltabs_and_acoustic_features_for_upload.csv

2.73 MB
20230919_seltabs_for_upload_group_signature_paper.csv

544.22 KB
20230919_variable_definitions_group_signature_paper.xlsx

12.72 KB
README.md

8.41 KB

Abstract

Vocalizations often vary in structure within a species, from the individual to the population level. Vocal differences among social groups and populations can provide insight into biological processes such as vocal learning and evolutionary divergence, with important conservation implications. As vocal learners of conservation concern, intraspecific vocal variation is of particular interest in elephants. We recorded calls from individuals in multiple wild elephant social groups in two distinct Kenyan populations. We used machine learning to investigate vocal differentiation among individual callers, core groups, bond groups (collections of core groups), and populations. We found clear evidence for vocal distinctiveness at the individual and population level, and evidence for much subtler vocal differences among social groups. Social group membership was a better predictor of call similarity than genetic relatedness, suggesting that subtle vocal differences among social groups may be learned. Vocal divergence among populations and social groups has conservation implications for the effects of social disruption and translocation of elephants.

Data from: African elephant rumbles differ between populations and sympatric social groups: possible consequences of vocal learning?

We investigated whether elephant rumbles contain acoustic signatures of individual caller identity, family group identity, bond group (collection of bonded family groups), or population. We recorded rumbles from wild African elephants in Samburu & Buffalo Springs National Reserves, northern Kenya and Amboseli National Park, southern Kenya, noting the identity of the caller and the behavioral context. We measured a suite of acoustic features on each call and used random forest models to try to assign calls to population, bond group, family group, and individual caller based on the acoustic features. We found that calls could be assigned to population and individual caller, but not family or bond group, with significantly better accuracy than the majority classifier that always guessed the most common category. However, when we calculated a proximity score (measure of similarity) between each pair of calls in the dataset, we found that calls from the same family or bond group were significantly more similar than calls from different family or bond groups, indicating that calls likely contain signatures of social group identity likely exist in the calls as well as population differences. We discuss possible drivers of this differentiation and conclude that vocal learning is the most likely cause, although other possibilities cannot be definitively ruled out.

Description of the data and file structure

“20230919_seltabs_for_upload_group_signature_paper.csv”:

This spreadsheet contains all the relevant information associated with each call in the dataset except for the acoustic feature variables. Each row represents a single call, and columns indicate the identity of the caller, the behavioral context, the certainty with which caller ID and behavioral context were known, the caller’s family group and population, and other relevant information, including the file name of the sound file in which the call occurs, the start and end times of the call in that sound file, the file name of the Raven Pro selection table used to annotate the sound file, and the number of the selection box in that table corresponding to the call in question. The sound files and Raven selection tables themselves are not included in this upload. The full set of variables in this spreadsheet are defined in the spreadsheet named “20230919_variable_definitions_group_signature_paper.csv”. The derived acoustic features used in the analysis are not included in this spreadsheet. Those measurements can be calculated using the code provided (“20230911_group_signature_code_for_upload.R”) and the raw acoustic measurements in the .RDS file named “20230911_MasterList_dur90_aenv_pmelspec_certID1_Samb_Ambo_n1918_dates_fixed.rds”. Values of “NA” in this spreadsheet indicate that the information in question was unknown for a given observation. Alternatively the user can simply use the spreadsheet “20230919_dfall_seltabs_and_acoustic_features_for_upload.csv” which contains the behavioral/identity information and the derived acoustic features together.

“20230919_variable_definitions_group_signature_paper.csv”:

This spreadsheet defines each of the variables in the files “20230919_seltabs_for_upload_group_signature_paper.csv” and “20230919_dfall_seltabs_and_acoustic_features_for_upload.csv”.

“20230911_MasterList_dur90_aenv_pmelspec_certID1_Samb_Ambo_n1918_dates_fixed.rds”:

This RDS file is an R list object which contains the raw acoustic contours measured on each call. The code provided (“20230911_group_signature_code_for_upload.R”) uses these raw acoustic contours to calculate a suite of derived acoustic features which are then used for subsequent statistical analyses. The list is nested with two layers. Each outer slot represents a single call, and is named with a unique identifier for that call, which is comprised of the date (YYYYMMDD) and time (HHMM) of the original recording as well as the Raven selection box number of the call in question. The inner slots represent individual acoustic contours, each of which is a vector of values. The acoustic contours are as follows: dur90 = time duration needed to capture 90% of the energy in the call (not used in analysis) aenv = Hilbert amplitude envelope aenvtime = time points (sec) for aenv (not used in analysis) pmsband(1-26): energy in each of the 26 bands of a mel spectrogram, calculated from 0-500 Hz. Each energy value within a given band corresponds to one of the time points in aenvtime pmsD(1-26): differences between successive values of the corresponding pmsband pmsDD(1-26): differences between successive values of the corresponding pmsD

“20230919_dfall_seltabs_and_acoustic_features_for_upload.csv”:

This file is a spreadsheet containing the observational data (caller ID, behavioral context, etc.) together with the derived acoustic features. This spreadsheet is generated in section 4 of the code using the data files “20230919_seltabs_for_upload_group_signature_paper.csv” and “20230911_MasterList_dur90_aenv_pmelspec_certID1_Samb_Ambo_n1918_dates_fixed.rds”. However, instead of recreating it de-novo the reader may skip sections 2-4 of the code and instead load this file in section 5 of the code. The variables in this spreadsheet are all described in the file “20230919_variable_definitions_group_signature_paper.csv”. Values of “NA” in this spreadsheet indicate that the information in question was unknown for a given observation.

Code/Software

“20230911_group_signature_code_for_upload.R”:

This R script contains the code for all the analyses in this study. Each section of the code is labeled to describe what it does. The packages required for the code to run are all listed in the first section. The sections in this script are as follows:

Load the required packages for the whole script.
Code used to segment the calls from the original raw sound files and save them as clips for further analysis. The original sound files themselves are not included in this upload so this section cannot be run, but it is included to illustrate how we segmented the recordings
Code used to measure acoustic contours on the segmented sound clips. The sound clips are not included in this upload so this section can also not be run, but it is included to illustrate how we performed the acoustic contour measurements
Code to derive acoustic features from the acoustic contours and append them to the observational data. This section requires two files that are included in this upload:
“20230919_seltabs_for_upload_group_signature_paper.csv” and
“20230911_MasterList_dur90_aenv_pmelspec_certID1_Samb_Ambo_n1918_dates_fixed.rds”.
Rather than running the previous code chunks, the user may start here and simply read in the file “20230919_dfall_seltabs_and_acoustic_features_for_upload.csv” which contains all the necessary data for the remaining analyses. This code chunk also adds a column for bond group ID (created from the existing column for family group ID), converts relevant variables to factors, and creates a subset of the data with just the calls from the Samburu population.
Random forest model to predict family group from acoustic features
Random forest model to predict bond group from acoustic features
Random forest model to predict population from acoustic features
Random forest model to predict caller ID from acoustic features
Calculate random forest proximity scores from family group model for each pair of calls in the dataset and run gamma regression to determine if call pairs from same family group are more similar than call pairs from different family groups
Calculate random forest proximity scores from bond group model for each pair of calls in the dataset and run gamma regression to determine if call pairs from same bond group are more similar than call pairs from different bond groups
Calculate random forest proximity scores from population model for each pair of calls in the dataset and run gamma regression to determine if call pairs from same population are more similar than call pairs from different populations
Make figures for the paper
Calculate sample sizes for different analyses in the paper

Data collection

We recorded rumbles from wild adult female elephants in Amboseli National Park, Kenya (“Amboseli”) from 1986–1990 and 1997–2006 and in Samburu and Buffalo Springs National Reserves, Kenya (“Samburu”) from in Nov 2019–Mar 2020 and Jun 2021–Apr 2022. These two populations are 390 km apart with no current gene flow between them due to intervening urban development. Both populations have been continuously monitored for decades, and all individuals can be individually identified by external ear morphology. We focused on adult females (10+ years of age) to ensure that any vocal differences between populations or social groups were not an artifact of age or sex. Our final dataset included calls from 21 adult females in Amboseli (mean ± SD age = 26.2 ± 12.9 years) and 81 adult females in Samburu (mean ± SD age = 25.6 ± 11.2 years). The field recording methods for this dataset have been previously published.

We recorded the identity of the caller and the behavioral context of each call. The caller was identified using behavioral and contextual cues, such as an open mouth, flapping ears, or being the only individual who was not a young calf in the immediate vicinity (calf calls are easily distinguished from adult/subadult calls due to their higher frequency and shorter duration). We only included in the analysis calls for which we were able to identify the caller with certainty. Behavioral context was originally scored using slightly different ethograms in Amboseli and Samburu. To facilitate comparison between these two datasets, we concatenated behavioral context into 9 categories shared across both populations (Supplemental Table S1).

Definition of social groups

We determined the group membership of the elephants in Samburu following a previously published protocol. In brief, we calculated simple ratio association indices between all adult females in the population using observational data collected between January 2019–April 2022, excluding individuals that were seen less than 20 times during this period. We performed a Ward’s hierarchical cluster analysis on association indices, plotted the cumulative number of bifurcations as a function of bifurcation distance, and identified the most significant knot by visual inspection of the plot. We cut the dendrogram at the height corresponding to this knot and designated all clusters below this height as separate core groups. To identify bond groups, we repeated the procedure using only the matriarch (oldest female) of each core group. We did not include Amboseli data in the analysis of core groups and bond groups because we did not have comparable association data for Amboseli, and because 92% of our Amboseli recordings came from individuals belonging to the same (subjectively defined) core group. Data analysis was conducted using R v. 4.2.2.

Call measurement

We measured 94 acoustic features on each call describing the distribution of energy across time and frequency in the mel spectrogram (Supplemental Methods, Supplemental Table S2). A mel spectrogram is similar to a traditional spectrogram (raster plot with time on the x-axis, frequency on the y-axis, and amplitude indicated by pixel darkness) but with frequency transformed to the logarithmic mel scale. While the mel scale was designed to approximate human hearing sensitivity, most other mammals, including elephants, perceive frequency on a similar logarithmic scale.

While the measurements we took from the mel spectrogram capture more of the variation in the calls than traditional measurements such as fundamental frequency and formants, they also capture background noise and thus may be more susceptible to influence from the recording equipment. To ensure that differences between Amboseli and Samburu could not be attributed to the different recording gear used in each population, we also traced the second harmonic (f₁) contour of each call, which is highly robust to recording equipment differences. We calculated nine summary statistics of the f₁ contour (Supplemental Table S2).

Individual, social unit, and population assignment accuracy

To test the hypothesis that Amboseli and Samburu elephants exhibit population-level acoustic differences, we ran a random forest (500 trees, 6 variables/node, 60% of observations/tree, minimum node size = 1, no maximum tree depth) to predict population as a function of the mel spectrogram acoustic features. A random forest is a machine learning model comprised of many decision trees, each using a randomly selected fraction of the data. This analysis included 938 calls from 81 individuals in Samburu, and 414 calls from 21 individuals in Amboseli. As random forests are biased towards more numerous classes, we balanced the dataset by randomly subsampling the Samburu observations so there were an equal number of observations in each class. To ensure that the model could only predict population using cues that generalized across callers, we randomly selected 20% of the callers with at least five calls each from each population and allocated all calls from these callers to the test set, with the remaining calls allocated to the training set. We calculated the proportion of observations in the test set that were classified correctly (classification accuracy) and ran a one-tailed exact binomial test comparing the classification accuracy to the proportion that would have been classified accurately if the model always guessed the most common group in the training set (“majority” or “zero rate” classifier). The majority classifier is a standard baseline model used in machine learning. When there are the same number of observations in each class, the majority classifier is mathematically equivalent to the weighted expectation (guessing each class with a probability equal to the proportion of the data comprised by that class). However, when the data are imbalanced, the majority classifier always outperforms the weighted expectation. We repeated the above process 10,000 times and calculated the median P-value across all runs. The number of calls allocated to the test set varied across runs because different callers produced different numbers of calls. The mean ± SD proportion of the calls allocated to the test set across 10,000 runs was 0.16 ± 0.05. Note that although the full dataset was balanced by subsampling an equal number of calls from each population, the data were not perfectly balanced within the training and test sets, because the number of calls from each population allocated to the test set depended on the number of calls per caller. Moreover, the majority population in the training set was not always the majority population in the test set. Thus, the majority classifier accuracy could be greater or less than 50%.

To determine if vocal differences between the two populations could be an artifact of the different recording equipment used in each population, we ran the same model using the f₁ contour measurements instead of the mel spectrogram measurements. We also ran a logistic regression model with population as the response variable and the f₁ contour measurements and caller age as the regressors, to determine which acoustic features had a significant relationship to population.

To test the hypotheses that elephants exhibit vocal signatures of group identity at the bond group or core group level, we ran two additional random forest models (same hyperparameters) to predict bond group and core group, respectively, as a function of the mel spectrogram acoustic features, using only data from Samburu. To ensure that the bond group model could only use acoustic features that generalized across the entire bond group, rather than features specific to core groups, to predict bond group, we restricted the dataset to bond groups that contained at least two core groups with at least five calls each in our dataset (6 bond groups, 931 calls). For each iteration of the model, we randomly selected one core group from each bond group and allocated all calls from those core groups to the test set. To ensure that the core group model could only use acoustic features that generalized across the entire core group, rather features specific to individual callers, to predict core group, we restricted the dataset to core groups that contained at least two individuals with at least five calls each in our dataset (7 core groups, 794 calls). For each iteration of the model, we randomly selected 20% of the callers with at least five calls each from each core group and allocated all calls from those individuals to the test set. The data was severely imbalanced across bond groups and core groups, but we were unable to balance it by subsampling the more numerous classes because doing so would have reduced the sample size excessively. We ran 10,000 iterations for each model, calculating the classification accuracy and p-value for each run as before. The mean ± SD proportion of the calls allocated to the test set was 0.32 ± 0.17 for the bond group model and 0.23 ± 0.08 for the core group model.

To test the hypothesis that elephant rumbles are individually specific, we ran a fifth random forest (same hyperparameters) to predict individual caller identity as a function of the mel spectrogram acoustic features. As calls produced by the same caller on the same date might exhibit similar features due to temporary circumstances such as caller’s internal state, behavioral context, and ambient conditions, we randomly selected one date for each caller and held out all calls from these caller-dates as the test set. We used callers from both populations for this analysis, but only included callers that produced at least three calls on at least two different dates each (427 calls from 15 callers in Samburu, 218 calls from 9 callers in Amboseli). We calculated the classification accuracy and p-value for each of 10,000 iterations as before. The mean ± SD proportion of calls allocated to the test set was 0.20 ± 0.02.

Assessment of call similarity

Elephant rumbles vary with the behavioral context and the individual identity and age of the caller. To determine if there were acoustic differences among populations, bond groups, or core groups that could not be explained by behavioral context, caller identity, or caller age, we calculated random forest proximity scores between each possible pair of calls. The random forest proximity score for a given pair of calls was the proportion of trees for which both calls were classified in the same terminal node, adjusted for the size of the node, and represented a metric of call similarity in terms of the acoustic features most relevant to predicting the response variable. We calculated proximity scores from four different random forests (population with mel spectrogram features, population with f₁ contour features, bond group, and core group), using the same hyperparameters and subsets of the data as before except that we increased the number of trees to 8000 and did not balance the data by subsampling or hold out any observations as a test set (population models: all observations, n=1352; bond group model: all bond groups in Samburu containing at least 2 core groups with at least 5 calls each, n=931; core group model: all core groups in Samburu containing at least 2 individuals with at least 5 calls each, n=794).

For each set of proximity scores, we ran a generalized linear mixed model with a gamma error distribution and a log link function. In each model, pairwise call proximity score was the response variable and pair ID (unique identifier for a given pair of callers) was a random effect. Fixed effects were “pair class” (whether the two calls in a given pair came from the same population, bond group, or core group, depending on the model) and the scaled and centered absolute value of the age difference between the two callers. For the population models, the “pair class” variable was binary: a pair of calls could either be from the same population or not. For the bond group and core group models, the “pair class” variable had three possible states: same core group, different core groups within the same bond group, or different bond groups within the Samburu population. To ensure that differences between populations or social groups were not an artifact of individual identity or behavioral context, we only included pairs of calls with the same behavioral context and different callers. We also only included calls for which we were certain of the behavioral context (see Supplemental Methods). This resulted in a sample size of 149,640 call pairs for the two population models, 97,586 call pairs for the bond group model, and 72,437 call pairs for the core group model. As proximity scores could be 0, we added 0.00001 to all proximity scores so all the values would be positive. For the bond group and core group models, we used the R package “emmeans” to examine the pairwise contrasts for each level of the “pair class” variable.

To assess whether vocal similarity between individuals was better explained by social affiliation or genetic relatedness, we ran two additional gamma regressions on call pairs, modeling call proximity score as a function of “binary pair class” (see below), caller age difference, and caller genetic relatedness. One gamma model used proximity scores extracted from the random forest trained to predict core groups, and the other used proximity scores extracted from the random forest trained to predict bond groups. These two sets of proximity scores represented the pairwise similarity between calls in terms of the features most relevant to predicting core group membership or bond group membership, respectively. For the model using proximity scores extracted from the bond group random forest, binary pair class indicated whether the two calls in a pair were from the same bond group. For the model using proximity scores extracted from the core group random forest, binary pair class indicated whether the two calls in a pair were from the same core group. These gamma models could only be run on the subset of callers for which we had genetic relatedness data (13 individuals for the bond group model, 8 for the core group model). Genetic relatedness was calculated in a previously published study using 20 microsatellite loci extracted from tissue or dung samples collected before 2006. Due to social disruption caused by poaching, many core groups and bond groups in the Samburu population now include unrelated individuals, which made it possible to independently assess the effects of social affiliation and relatedness on call similarity.

All statistical analyses were performed in R v. 4.2.2, and 0.05 was used as the significance threshold for all tests.