Skip to main content

Data from: Audio-visual crossmodal correspondences in domestic dogs (Canis familiaris)

Cite this dataset

Korzeniowska, Anna; Root-Gutteridge, Holly; Simner, Julia; Reby, David (2019). Data from: Audio-visual crossmodal correspondences in domestic dogs (Canis familiaris) [Dataset]. Dryad.


Crossmodal correspondences are intuitively held relationships between non-redundant features of a stimulus, such as auditory pitch and visual illumination. While a number of correspondences have been identified in humans to date (e.g. high pitch is intuitively felt to be luminant, angular, and elevated in space), their evolutionary and developmental origins remain unclear. Here we investigated the existence of audio-visual crossmodal correspondences in domestic dogs, and specifically, the known human correspondence in which high auditory pitch is associated with elevated spatial position. In an audio-visual attention task, we found that dogs engaged more with audio-visual stimuli that were congruent with human intuitions (high auditory pitch paired with a spatially elevated visual stimulus) compared to incongruent (low pitch paired with elevated visual stimulus). This result suggests that crossmodal correspondences are not a uniquely human or primate phenomenon, and that they cannot easily be dismissed as merely lexical conventions (i.e., matching ‘high’ pitch with ‘high’ elevation).


Data was collected by video recording the dogs' reactions during the presentation of audio-visual stimuli. Each row in the data set represents a trial (one 8 second audio-visual animation presented to a dog). The presentations were projected onto a wall with an overhead projector (Eiki Brilliant Projector LC-XB28) and a MacBook Pro laptop. The sound was played using two Behringer Europort MPA40BT speakers placed adjacent and on both sides of the wall onto which the animations were projected. An audio-visual animation of moving insects was projected in between each trial as a means of attracting the dogs’ attention to the screen. Dogs’ behaviour was recorded using a SONY (Handycam XAVC 5 AVCHD Progressive) camera placed on a tripod in front and to the left of the dog. There was another camera (SONY Handycam AVCHD Progressive) placed in front and to the right of the dog which was sending a live feed to a screen monitor placed behind the dog and owner. Data was coded using Gamebreaker 10 by two independent raters, blind to the condition. 

A within subject design was used with each dog seeing both the congruent and incongruent version of the audio-visual stimulus once. We compared congruent and incongruent conditions, with three dependent measures: the duration-of-looking at the stimulus (time in seconds each dog spent with its gaze focused on the stimulus), time-spent-tracing the stimulus (evidenced by the amount of time in seconds each dog spent moving its head up and down to follow the stimulus) and the percentage of time the dog spent tracing, out of the total time he/she spent looking; i.e., (time-spent-tracing/ duration-of-looking) x100. A linear mixed model was run using SPSS v.25 (SPSS Inc., Chicago, IL, USA) and the differences in means were considered significant at an alpha level of 0.05.


Usage notes


Biotechnology and Biological Sciences Research Council, Award: BB/P00170X/1

European Research Council, Award: (FP/2007-2013)/ERC Grant Agreement no. [617678]


United Kingdom