Skip to main content
Dryad logo

Domestic dogs (Canis familiaris) recognise meaningful content in monotonous streams of read speech


Root-Gutteridge, Holly; Ratcliffe, Victoria; Korzeniowska, Anna; Reby, David (2021), Domestic dogs (Canis familiaris) recognise meaningful content in monotonous streams of read speech, Dryad, Dataset,


Domestic dogs (Canis familiaris) can recognize basic phonemic information from human speech and respond to commands. Commands are typically presented in isolation with exaggerated prosody known as dog-directed speech (DDS) register. Here, we investigate whether dogs can spontaneously identify meaningful phonemic content in a stream of putatively irrelevant speech spoken in monotonous prosody, without congruent prosodic cues. To test this ability, dogs were played recordings of their owners reading a meaningless text in which we inserted a short meaningful phrase or a short meaningless phrase, either read with unchanged reading prosody or with an exaggerated DDS prosody. We measured the occurrence, duration and latency of dogs’ gaze at their owners. We found that, while dogs were more likely to detect and respond to inserts that contained meaningful phrases spoken with DDS prosody, they were still able to detect these meaningful inserts spoken in a neutral reading prosody. Dogs also detected meaningless control inserts spoken with exaggerated prosody, but their gaze responses were significantly shorter than when hearing meaningful content spoken in DDS. This suggests that, while DDS prosody facilitates the detection of meaningful content in human speech by capturing dogs’ attention, dogs are capable of spontaneously recognizing meaningful phonemic content within a stream of speech without it.


60 owners were recorded reading aloud one of three short (15-20 second) passages from the standard psychology text “the rainbow passage” [33], with the test phrases produced after 7-12 seconds as part of the text. The non-meaningful (control) phrases were “[Alfie / Bertie], pass me a coffee!” and the meaningful phrase was “[Dog’s name], come on then!”, chosen as these words had the highest frequency of use by English-speaking owners during interactions with their dogs and were therefore likely to be meaningful to all dogs [22]. The duration of the target phrases was between 1 and 2 seconds, depending on the speaker’s natural talking speed. Voice recordings were made on a Zoom H4N-Pro handheld recorder (Zoom) in a sound-proof booth on campus at University of Sussex. Owners were asked to produce the target phrases in a) their normal reading voice prosody (NRP) and b) dog-directed speech prosody (DDS). There was an expectation that the DDS speech would be show increased pitch and range compared to NRP and that this would be more interesting to the dogs [23]. Two recordings were made for study A: DDS-meaningful and DDS-control; three recordings were created for study B: NRP-meaningful, NRP-control, and DDS-meaningful, and four recordings for study C: NRP-meaningful, NRP-control, DDS-meaningful, and DDS-control.

All the voice recordings were clipped and aligned using the sound software Audacity [34] and the amplitude normalized to -9dB. Mean and coefficient of variation of fundamental frequency were measured in Praat [35].


56 privately-owned dogs were recruited through Facebook adverts, flyers, and personal contacts, and tested in a designated testing room on campus at University of Sussex with 51 dogs retained (25 females and 26 males from 39 breeds, aged between 9 months and 12 years old (mean = 4.1 years, SD = 2.9). Trials were discarded if the dog was distracted by non-stimuli sounds or events, e.g. background noise (n = 1), the dog was barking continuously (n = 1), or if they moved out of camera shot (n = 3).


During all trials, the owners wore noise-cancelling headphones (TaoTronics) and listened to music while seated in a chair at 90 degrees to the dog (Figure 1). A single Behringer Europort MPA40BT-PRO speaker was set on a tripod behind the owner’s head and set to conversational volume (approx. 65dB measured at dog’s position). The experimenter stood out of the dog’s sight line and played the stimuli from an Apple MacBook Pro. The dogs were held on a loose lead by the handler and allowed some freedom of movement. The dogs were positioned either to the left or the right of the speaker, and this position was cross-balanced across dogs within studies, with half to the left and half to the right. The dogs’ reactions were filmed on a Sony FDR-AX100 camcorder (Sony) on a tripod positioned approximately 1.5-2m from the dogs’ starting position. Trial interval depended on the dogs’ disposition. If the dog was calm, trial interval was less than 2 minutes, but if the dog was restless or distracted, a short break of a few minutes was provided, and the dog was sometimes taken out of the room and returned. Whether the dogs gazed at their owner or not was used as the broadest metric of attention, while duration of gaze was used as the index of attention.

Study A: The effect of meaning on dogs’ responses to content presented in dog-directed speech prosody

Study A investigated whether dogs responded differently to inserts containing meaningful phrases vs. meaningless, control phrases, in both cases spoken with dog directed prosody (DDS). Twenty-two dogs were tested, and 40 trials from 20 dogs were retained. Each dog was presented with a recording of their female owner reading the text twice, once inserting the meaningful phrase and once inserting the control phrase. The order of presentation of meaningful and control phrase recordings were cross-balanced across dogs (Table 1).

Study B: The effects of prosody and meaning on dogs’ responses to content presented in neutral reading prosody

In Study B, dogs’ ability to detect meaningful content presented in neutral reading prosody speech (NRP) was tested. Thirty-five dogs were tested. Of these, 22 dogs heard speech from just one owner and 13 heard speech from both their male and female owners. Each dog was initially presented with two playback trials with NRP-control and NRP-meaningful phrases embedded, with presentation cross-balanced across subjects. To test their responsiveness to speech, the dogs then heard a third trial presenting DDS-meaningful phrase. NRP-speech were always played first to avoid cueing the dogs to the presentation of meaningful speech. Dogs who heard both their owners were given a brief break between the two sets of playbacks to reduce habituation.

Study C: The effects of gender on dogs’ responses to content and prosody

In Study C, to explore the potential effects of speaker gender, 13 of the 35 dogs from study B were tested on their response to both their male and female owners’ speech, with 12 retained. These dogs heard the 3 different speech presentations from study B (NRP-control, NRP-meaningful, and DDS-meaningful) and an additional trial, DDS-control, from each of their male and female owners (i.e. 8 trials per dog). Trials from one dog were removed as he moved out of camera shot, therefore 96 trials were retained. NRP trials were always played first for each owner, with control and meaningful phrase presentation cross-balanced within prosodic conditions. One dog had been previously tested in Study A with a gap of several months between tests. All trials were performed on the same day and between trial times varied from a few minutes to more than 20 depending on the behaviour of the dog.

Prior to analysis, the videos of the trials were edited in iMovie (Apple Inc.) so that each file presented a single trial with a sound effect replacing the target phrase.

Usage Notes

There are no missing values.


Biotechnology and Biological Sciences Research Council, Award: BB/P00170X/1