Skip to main content

An analysis of avian vocal performance at the note and song levels

Cite this dataset

Logue, David et al. (2020). An analysis of avian vocal performance at the note and song levels [Dataset]. Dryad.


Sexual displays that require extreme feats of physiological performance have the potential to reliably indicate the signaller’s skill or motivation. We tested for evidence of performance constraints in Adelaide’s warblers (Setophaga adelaidae) songs. At the note level, we identified three trade-offs with well-defined limits. At the song level, we identified two trade-offs, but their limits were less well-defined than the note-level limits. Trade-offs at both levels suggest that song structure is constrained by limits to the speed of both frequency modulation (while vocalizing and between notes) and respiration.  Performance metrics derived from the observed limits to performance varied moderately among individuals and strongly among song types. Note-level performance metrics were positively skewed, as predicted by the hypothesis that performance is constrained. We conclude that physiological limits on frequency modulation and respiration constrain song structure in male Adelaide’s warblers. Further work is needed to determine whether receivers respond to natural levels of variation in performance, and whether performance correlates with singer quality or motivation.


Study species

Adelaide’s warbler is a socially monogamous insectivore endemic to Puerto Rico and Vieques. Mated pairs maintain all-purpose territories throughout the year (Toms 2010). Male songs are frequency modulated trills (Fig. 2). Individual males sing repertoires of song types (avg. ≈ 23 songs), many of which are shared with other males (Staicer 1991; Kaluthota et al. 2019). Each male’s repertoire comprises two categories, A and B, which are characterized by distinct times of delivery, song rates, and song switching frequencies (Staicer 1991; Kaluthota et al. 2019). The individual notes comprising songs are structurally simple, with almost all of the energy concentrated in the fundamental frequency. There exists considerable among-note variation in duration, frequency, and frequency modulation (Fig. 2).


This research was approved by the Institutional Animal Care and Use Committee at the University of Puerto Rico, Mayagüez (17 September, 2010). It adheres to the ASAB/ABS Guidelines for the use of animals in research. Birds were captured under D.M.L.’s federal bird banding permit (no. 23696). The U.S. Fish and Wildlife Service granted permission to work at the Cabo Rojo Wildlife Refuge (permit 2012-01).

Recording and scoring

We recorded nine mated male Adelaide’s Warblers at the Cabo Rojo National Wildlife Refuge (US Fish and Wildlife Service) during the breeding season between March and June, 2012. All subjects wore unique combinations of coloured leg bands, allowing recordists to unambiguously identify individuals. Each male was recorded on four days, from 45 minutes before sunrise until 2 hours and 45 minutes after sunrise. Consecutive recordings of a given male were separated by at least four days, except on two occasions when recordings were made on consecutive days because of logistical constraints. Recordings were made with Marantz PMD 661 digital recorders and Sennheisser ME67 shotgun microphones (file format = wav, sampling rate = 44.1KHz, bit depth = 16 bits). This is the same set of recordings used in a previous study of short-term variation in song performance (Schraft et al. 2017), a methods paper on song sequences (Hedley et al. 2018), and an analysis of singing modes (Kaluthota et al. 2019).

We visualized recordings as sound spectrograms in Syrinx PC v.2.6 (J. Burt, Seattle, WA; Settings: Blackman window, transform size = 1024 points). Each song recording from a focal male was assessed for recording quality. One person (D.M.L.) assigned song recordings to song types. A previous study that used the same recordings estimated the inter-rater reliability of song type scoring to be 100% within an individual bird, and 87% among individuals (Kaluthota et al. 2019). We only used high-quality recordings (high signal-to-noise ratio, minimal overlap with other sounds) for song measurements. High quality song recordings were analysed in Luscinia v2.14 (max. freq. = 10kHz, frame length = 5ms, time step = 1ms, dynamic range = 35 dB, dynamic equalization = 100ms, de-reverberation = 100%, de-reverberation range = 100ms, high pass threshold = 1.0kHz, noise removal = 10dB; Lachlan 2007). We loosely outlined the trace of each note with a stylus on a touchscreen monitor, and Luscinia’s algorithms identified the signal and rejected background noise. The following note-level acoustic variables were output to a spreadsheet: start time, end time, maximum peak frequency, and minimum peak frequency. Luscinia offers several frequency metrics. We chose peak frequency because visual inspection of spectrograms showed that it tracked the fundamental frequency better than the fundamental frequency metric.

Analysis: trade-offs

The note-level analysis omitted the one or two low-amplitude notes that began some songs and the final note of all songs. We omitted final notes because it was impossible to define the duration of the silent gap following the last note. For the note-level analyses, we analysed the frequency bandwidths and durations (based on note start and end times) of both notes and silent gaps (Fig. 3). Following Cardoso (2013), we measure frequency bandwidth (BW) as the ratio of the maximum frequency to the minimum frequency. Gap duration and gap BW are taken from the silent gap after the focal note. Gap BW is based on the end of the focal note and the beginning of the subsequent note. We tested three comparisons at the note level that might reveal trade-offs indicative of performance constraints: note duration vs. gap duration (respiratory), note BW vs. note duration (voiced FM), and gap BW vs. gap duration (unvoiced FM).

We chose four parameters for the song-level analyses: trill rate, average frequency bandwidth, percent of sound, and duration. Trill rate (TR) was calculated as the number of notes in the song minus one, divided by the time from the beginning of the first note to the beginning of the last note. We excluded the final note from this calculation because TRs based on the full song necessarily omit the ‘gap’ after the last note, biasing estimates upward for songs with fewer notes. Adelaide’s warbler songs are frequency modulated trills (Fig. 2), so the total frequency bandwidth of a song is only weakly related to the amount of FM in the song. We therefore calculated a song’s BW as the average BW of the notes in the song. Percent of sound (PoS) is the percent of the song that is voiced. It was calculated as the sum of note durations, divided by the total duration of the song, multiplied by 100. Lastly, we measured song duration because many kinds of performance increase in difficulty with increasing duration (Byers et al. 2010). We tested four comparisons that might reveal trade-offs at the song level: TR vs. mean BW (FM, respiratory), duration vs. TR (respiratory endurance), duration vs. PoS (respiratory endurance), and TR vs. PoS (respiratory).

Wilson et al. (2014) recommend quantile regression to test for acoustic trade-offs. Quantile regression produces a linear function to estimate a defined quantile Y over a range of X (Cade and Noon 2003). Standard quantile regression, however, fails to account for the non-independence of multiple data points from a given subject, raising concerns about pseudoreplication. Unlike standard quantile regression, mixed quantile regression includes one or more random variables (Geraci 2014). We ran mixed quantile regression with the random variable “individual identity” to account for the non-independence of acoustic units from the same individual. To the best of our knowledge, this is the first study to use mixed quantile regression to test for acoustic trade-offs. Initially, we ran all models with both random intercepts and random slopes (Barr et al. 2013). The random slope estimates proved unstable for the note level models, so we ran random intercepts models for the note-level analyses and verified the absence of Simpson’s paradox (see Results). For those analyses, we report the sum of the overall intercept estimate and the average individual intercept adjustment, because that value represents a population-level estimate of the intercept (Table ESM-1). The song-level models include both random intercepts and random slopes.

We predicted positively sloping lower boundaries for all note-level analyses, because higher values of acoustic variable X were predicted to constrain the minimum values of variable Y (as in Geberzahn and Aubin 2014a). The term “trade-off” often implies a negative relationship, so readers may question whether it is appropriate to describe a positive relationship as evidence of a trade-off per se. We applied the term to both positive and negative relationships because a negative correlation can be functionally identical to a positive one (e.g., a positive relationship between note duration and gap duration is the same as a negative relationship between note duration and gap shortness), and both can result from constrained resource allocation.

We predicted negative upper boundaries for all song-level analyses, because higher values of variable X were predicted to constrain the maximum values of variable Y as in (as in Podos 1997). For the note-level dataset, we tested lower boundaries with 10th quantile regressions (tau = 0.10), following advice in Wilson et al. (2014). Similarly, we ran 90th quantile regressions (tau = 0.90) to estimates upper boundaries in the song-level dataset (Wilson et al. 2014).

Population-level performance limits could arise from a pooled analysis of individuals that do not themselves exhibit trade-offs. For example, some individuals might sing high-trill-rate, low-bandwidth songs while others sing low-trill-rate, high-bandwidth songs, resulting in a sloping limit to the population’s distribution when individuals’ data are pooled. Alternatively, different individuals may be subject to similar trade-offs, which combine to produce a population-wide trade-off. We therefore examined data from individual birds for evidence of trade-offs.

In addition to the hypothesis tests described above, we offer several visual representations of our data. We graphed the whole distributions with semi-transparent points and quantile regression lines. To show individual variation near the boundaries, we also present zoomed-in views of the boundary regions with separate colours for each individual and polygons that mark the limit of each individual’s distribution. Polygons were generated with the geom_encircle command in the ggalt package (Rudis et al. 2017), with settings s_shape =1 (no added curvature) and expand = 0 (polygon edges intersect extreme points). Finally, we present separate distributions for each individual in the electronic supplementary material.

Analysis: Performance metrics

Deviation scores were calculated as the orthogonal distance from the quantile regression lines, such that higher (more positive) values indicate lower putative performance (Podos 2001). We estimated intra-class correlations (ICCs) to test how repeatable individuals were with respect to note-level deviation scores averaged over songs and song-level deviation scores. We calculated ICCs for individuals using both the entire sample and the average scores for each song type within individual (see Introduction). We also tested whether note-level deviation scores averaged over songs and song-level deviation scores were significantly repeatable among song types. We conducted likelihood ratio tests to test whether ICC’s were statistically distinguishable from 0 (Stoffel et al. 2017).

We generated Pearson’s correlation matrices of deviation scores for notes and songs. The song-level correlation matrix included song-level deviations, note-level deviations averaged over songs, and FEX. We calculated the skewness of the deviation scores to test competing predictions of the constrained performance and constrained learning hypotheses.

All statistics were conducted in R Studio (Team 2015). Mixed quantile regressions relied on the lqmm package (Geraci 2014). Intra-class correlations were assessed with the rptR package (Stoffel et al. 2017). Data visualizations relied on the package ggplot2 (Wickham and Chang 2008). Data and R code are available at DRYAD (doi:10.5061/dryad.vt4b8gtn0).

Usage notes

The rmd files are Markdown files for use in R.  "PerfConst17.3.rmd" is the note-level analysis and "SongPerfConst7.1.rmd" is the song-level analysis. The current versions of the .rmd files correct minor errors in previous versions, including versions that were previously posted to Dryad. These errors were corrected before publication, so the paper is based on the corrected version of the R code. Feel free to contact me about these data at


Natural Sciences and Engineering Research Council, Award: RGPIN-2015-06553