Skip to main content
Dryad

Domestic dogs (Canis familiaris) recognise meaningful content in monotonous streams of read speech

Cite this dataset

Root-Gutteridge, Holly; Korzeniowska, Anna; Ratcliffe, Victoria; Reby, David (2023). Domestic dogs (Canis familiaris) recognise meaningful content in monotonous streams of read speech [Dataset]. Dryad. https://doi.org/10.5061/dryad.stqjq2c1s

Abstract

Domestic dogs (Canis familiaris) can recognize basic phonemic information from human speech and respond to commands. Commands are typically presented in isolation with exaggerated prosody known as dog-directed speech (DDS) register. Here, we investigate whether dogs can spontaneously identify meaningful phonemic content in a stream of putatively irrelevant speech spoken in monotonous prosody, without congruent prosodic cues.

To test this ability, dogs were played recordings of their owners reading a meaningless text in which we inserted a short meaningful or meaningless phrase, either read with unchanged reading prosody or with an exaggerated DDS prosody. We measured the occurrence and duration of dogs’ gaze at their owners.

We found that, while dogs were more likely to detect and respond to inserts that contained meaningful phrases spoken with DDS prosody, they were still able to detect these meaningful inserts spoken in a neutral reading prosody. Dogs detected and responded to meaningless control phrases in DDS as frequently as to meaningful content in neutral reading prosody, but less often than to meaningful content in DDS.

This suggests that, while DDS prosody facilitates the detection of meaningful content in human speech by capturing dogs’ attention, dogs are nevertheless capable of spontaneously recognizing meaningful phonemic content within an unexaggerated stream of speech.

Methods

Stimuli

70 owners were recorded reading aloud one of three short (15-20 second) passages from the standard psychology text “the rainbow passage”, with the test phrases produced after 7-12 seconds as part of the text. The non-meaningful (control) phrases were “[Alfie / Bertie], pass me a coffee!” and the meaningful phrase was “[Dog’s name], come on then!”, chosen as these words had the highest frequency of use by English-speaking owners during interactions with their dogs and were therefore likely to be meaningful to all dogs. The duration of the target phrases was between 0.7s and 2.5s(mean = 1.38s, std. dev. = 0.24), depending on the speaker’s natural talking speed and the dog’s name (e.g., “Badger” takes longer to say than “Max”). In total, three different extracts of the same length were used and the phrases were included within the sentences, i.e., “There is, according to legend, a boiling pot of gold at one end. People look, but no one ever finds it. When a man looks for something beyond his reach, his friends say, [Bertie, pass me a coffee] / [Dog’s name, come on then], he is looking for the pot of gold at the end of the rainbow. Throughout the centuries people have explained the rainbow in various ways.” (See ESM for extracts 2 and 3.) The time it took the owners to reach the inserted phrase depended on the speed of their natural speech (mean = 8.71s, std. dev. = 1.17s) but was consistent across readings by the same individual.

The choice of abstract was randomised but if the dog had a name too similar to Alfie or Bertie, the other name was chosen as the control (e.g., the participant dogs Betty and Beans heard Alfie, not Bertie, in their control phrase). For each dog, the same extract was used for all conditions. Voice recordings were made on a Zoom H4N-Pro handheld recorder (Zoom) in a sound-proof booth on campus at University of Sussex. Owners were asked to produce the target phrases in a) their normal reading voice prosody (NRP) and b) dog-directed speech prosody (DDS). There was an expectation that the DDS speech would show increased pitch and range compared to NRP and that this would be more interesting to the dogs. Thus, two recordings were made for study A: DDS-pilot: DDS-meaningful and DDS-control; three recordings were created for study B neutral prosody: NRP-meaningful, NRP-control, and DDS-meaningful, and four recordings for studies C and D: NRP-meaningful, NRP-control, DDS-meaningful, and DDS-control. 

All the voice recordings were clipped and aligned using the sound software Audacity [54] and the amplitude normalized to -9dB. Mean and coefficient of variation of fundamental frequency (foCV = (fo standard deviation / fo mean ) *100) were measured in Praat [55]. foCV provides a standardised measure of  fo variability independent of fo height that takes perception into account (i.e., a modulation of 10Hz around 100Hz is perceptually equivalent to a modulation of 100Hz around 1,000Hz).

Participants

Sixty-nine privately-owned dogs were recruited through Facebook adverts, flyers, and personal contacts, and tested in a designated testing room on campus at University of Sussex. A total of 70 owners (19 male, 51 female) participated, with a maximum of 3 dogs per owner. Trials were discarded if the dog was distracted by non-stimuli sounds or events, e.g., background noise (n = 1), the dog was barking continuously (n = 1), or if they moved out of camera shot (n = 5). We retained data from 64 dogs (28 females and 36 males from 50 breeds and cross-breeds, aged between 9 months and 12 years old (mean = 4.0 years, SD = 2.9) in our analyses (see ESM Table 1 for details following Volsche et al.’s suggested format).

Protocol

Dogs were introduced to the room and given up to 20 minutes to freely explore and habituate to the space. Once they were considered to be relaxed, the trials began. No dogs appeared to be stressed either before or during the trials.

During all trials, the owners wore noise-cancelling headphones (TaoTronics) and listened to music while seated in a chair at 90 degrees to the dog. A single Behringer Europort MPA40BT-PRO speaker was set on a tripod behind the owner’s head and set to conversational volume (approx. 65dB measured at dog’s position). The experimenter stood out of the dog’s sight line and played the stimuli from an Apple MacBook Pro. The dogs were held on a loose lead by the handler and allowed some freedom of movement. While the handler was consistently one of two researchers, their familiarity to the dog could vary from “completely unfamiliar” to “person the dog met on more than one occasion but do not have a close relationship to” if the dog had participated in a previous study before or belonged to a friend of the researchers.

The dogs were positioned either to the left or the right of the speaker, and this position was cross-balanced across dogs within studies, with half to the left and half to the right. The dogs’ reactions were filmed on a Sony FDR-AX100 camcorder (Sony) on a tripod positioned approximately 1.5-2m from the dogs’ starting position. Trial interval depended on the dogs’ disposition. If the dog was calm, trial interval was less than 2 minutes, but if the dog was restless or distracted, a short break of a few minutes was provided, and the dog was sometimes taken out of the room and returned.

Study A: DDS-pilot: The effect of meaning on dogs’ responses to content presented in dog-directed speech (DDS) prosody

Study A: DDS-pilot was designed to test whether dogs responded differently to inserts containing meaningful phrases vs. meaningless, control phrases, in both cases spoken with dog directed prosody (DDS). If they did not respond to the DDS presentation of speech, it was felt that it was unlikely that they would do so to NRP speech and that a new protocol would be required. Twenty-two dogs were tested, and 40 trials from 20 dogs were retained, with 2 dogs removed because they moved out of camera view during the stimulus. All owners included in this study were female. Each dog was presented with a recording of their female owner reading the text twice, once inserting the meaningful phrase and once inserting the control phrase. The order of presentation of meaningful and control phrase recordings were cross-balanced across dogs.

Study B neutral prosody: The effects of prosody and meaning on dogs’ responses to content presented in neutral reading prosody (NRP)

In study B neutral prosody, we explored the dogs’ response to neutral reading prosody speech (NRP) by testing their ability to detect meaningful content presented in this register. Thirty-four dogs were tested. Of these, 21 dogs heard speech from just one owner (17 female owners and 4 male owners) and 13 heard speech from both their male and female owners as part of study C prosody.

Each dog was initially presented with two playback trials with NRP-control and NRP-meaningful phrases embedded, with presentation cross-balanced across subjects. To test their responsiveness to speech, the dogs then heard a third trial presenting DDS-meaningful phrase. NRP-speech were always played first to avoid cueing the dogs to the presentation of meaningful speech. Dogs who heard both their owners were given a brief break between the two sets of playbacks to reduce habituation.

Study C prosody: Impacts of Prosody and Content on response

To better explore the effects of prosody and content, study B neutral prosody was repeated with dogs hearing all four speech conditions in pseudo-randomised presentation across four trials. 25 dogs were included in the analysis (13 females, 12 males). The same protocol was used as study B neutral prosody with 4, not 3 presentations of speech to each dog. A total of 148 trials from 37 dogs were retained (12 dogs heard 8 trials, with 4 trials from their male owner and 4 trials from their female owner). 

Study D gender: The effects of gender on dogs’ responses to content and prosody

During initial data collection for study B neutral prosody, it was noted that some of the dogs appeared to be more responsive to the male owner’s NRP speech than their female owner’s NRP speech. Therefore, we decided to explore the potential effects of speaker gender on their responses. Thus, we tested whether dogs hearing both their male and female owners would respond differently to them across all four conditions of meaning and prosody, with an expectation that NRP from male owners could elicit more or stronger responses than female NRP due to the smaller differences between male NRP and DDS.

Each of the 13 dogs heard a total of 8 trials, 4 from each owner. To avoid the possible effect of learning on response to the target phrases, as the same text passage was used throughout, the NRP trials were always played first for each owner, with control and meaningful phrase presentation cross-balanced within DDS conditions. Both owners were present in the room, but the non-participant (e.g., the male while the female was “talking” to the dog) was kept out of view to prevent any “clever Hans” effect influencing the results.

One dog was removed from the dataset because he moved out of camera shot while reacting to his owners’ voices. One dog (Emma, terrier) had been previously tested in study A: DDS-pilot with a gap of several months between tests, but all other dogs experienced this as a novel presentation and it was expected that Emma would not retain her memories of study A: DDS-pilot or be primed by them. Thus, 96 trials were retained from 12 dogs in total, with each dog hearing a total of 8 trials, including all four speech presentations from both their male and female owners.

All eight trials were performed on the same day and between trial intervals varied from a few minutes to more than 20 minutes depending on the behaviour of the dog, e.g., engagement in other activities like sniffing or investigating the area. We counterbalanced the presentation of male and female owners’ speech, but each dog heard all four trials from each owner as a block which was not divided (e.g. male owner trials x 4 then female owner trials x 4, but not male owner x 2 then female x 2 etc.). The dogs heard the same order of presentation for both male and female owners (e.g., either order 1 or order 2) to avoid order effects on their responsiveness.

Usage notes

There are no missing values.

Funding

Biotechnology and Biological Sciences Research Council, Award: BB/P00170X/1