Skip to main content
Dryad logo

Spontaneous mimicry of live facial expressions: A biological mechanism for emotional contagion


Hirsch, Joy (2022), Spontaneous mimicry of live facial expressions: A biological mechanism for emotional contagion, Dryad, Dataset,


Observation of live facial expressions typically elicits similar expressions (facial mimicry) accompanied by shared emotional experiences (emotional contagion). The model of embodied emotion proposes that emotional contagion and facial mimicry are functionally linked although the neural underpinnings are unknown. To address this knowledge gap we employed two-person (n = 20 dyads) functional near-infrared spectroscopy during live emotive face-processing while also measuring eye-tracking, facial classifications, and ratings of emotion. One partner, “Movie Watcher”, was instructed to emote natural facial expressions while viewing evocative short movie clips. The other partner, “Face Watcher”, viewed the Movie Watcher’s face. Dyadic roles were alternated between partners. Task and rest blocks were implemented by timed epochs of clear and opaque glass that separated partners. Correlations of dyadic facial expressions (r = 0.41) and dyadic affect ratings (r = 0.66) were consistent with findings of both emotional contagion and facial mimicry. Neural correlates of emotional contagion based on covariates of partner ratings included right angular and supramarginal gyri. Neural correlates of mimicry associated with partner facial action units include core face recognition system. Thus, the proposed linkages between facial mimicry and emotional contagion represent separate components of face processing.


Participants. Adults 18 years of age and older who were healthy and had no known neurological disorders (by self-report) were eligible to participate. The study sample included 40 participants (26 women, 12 men, and 2 identified as another gender; mean age: 26.3 ± 10.5 years; 36 right-handed and 4 left-handed. See Table S1). All participants provided written informed consent in accordance with guidelines approved by the Yale University Human Investigation Committee (HIC #1501015178) and were reimbursed for participation. Dyad members were unacquainted prior to the experiments and assigned in order of recruitment. Laboratory practices are mindful of goals to assure diversity, equity, and inclusion, and accruals were monitored by regular evaluations based on expected distributions in the surrounding area. Each participant provided demographic and handedness information before the experiment.

Setup. Dyads were seated 140 cm across a table from each other (and were fitted with an extended head-coverage fNIRS cap. Separating the two participants was a custom-made controllable “smart glass” window that could change between transparent and opaque states by application of a programmatically manipulated electrical current. Attached to the top and middle of the smart glass were two small 7-inch LCD monitors with a resolution of 1024x600. The monitors were placed in front of and above the heads of each participant, so the screens were clearly visible but did not obstruct their partner’s face. Monitors displayed video clips (Movie Watcher only. See Paradigm below.) and cued both participants to rate emotional experiences using the dial. 

Paradigm. During the interaction, participants took turns in two aspects of a dyadic interactive task. One partner within a dyad, the Movie Watcher, watched short (3-5 s) video clips on the LCD screen (presented using a custom Python script) while the other partner, the Face Watcher, observed the face of the Movie Watcher. Partners alternated roles as Movie Watcher and Face Watcher. Movies were presented in three-minute runs that alternated between 15 s of movies and 15 s of rest. There were between 3 and 5 movies in each 15 s task block with a maximum number of 30 movie clips in a run. Each Movie Watcher saw a total of 60 movie clips (two runs) for each movie type, and there are three movie types, so each Movie Watcher saw approximately 180 movie clips. After each 15 s set of movie stimuli, Movie Watchers rated their emotional responses with a dial on a Likert-type scale evaluating both valence and intensity (Positive emotions: +2 to +5, Neutral/No feeling: 0 to 1, Negative emotions: -2 to -5) to the movie block. Face Watchers rated their emotional feelings based on their perceptions of the Movie Watcher’s facial expressions during the same period. Facial expressions of both partners were acquired by cameras (recorded as part of the Python script) and analyzed with OpenFace (details below). Each participant performed the Movie Watching and Face Watching tasks three times, including two runs for every movie type (“adorables”, “creepies”, and “neutrals”) for a total of six 3-minute runs and a duration of 18 minutes. Movie clips were not repeated. 

Movie library of emotive stimuli to induce natural facial expressions. Emotionally evocative videos (movies) intended to elicit natural facial expressions were collected from publicly accessible sources and trimmed into 3-5 s clips. All video stimuli were tested and rated for emotive properties by lab members along with 283 crowdsourced Amazon Mechanical Turk participants who rated 134 video clips. The clips contained no political, violent, or frightening content, and participants were given general examples of what they might see prior to viewing (content notifications). The three categories of videos included “neutrals”, featuring landscapes; “adorables”, featuring cute animal antics; and “creepies”, featuring spiders, worms, and states of decay. Videos were rated by the intensity of emotions experienced (from 0-100 on a continuous measure Likert-type scale; 0: the specific emotion was not experienced and 100: emotion was present and highly intense) according to basic emotion type (joy, sadness, anger, disgust, surprise, and fear). For example, a video clip of pandas rolling down a hill (from the adorables category) might be rated an 80 for joy, 40 for surprise, and 0 for sadness, fear, anger, and disgust. Responses were collected and averaged for each video. The final calibrated set used in the experiment consisted of clips that best evoked intense emotional reactions (except for the “neutrals” category, from which the lowest-rated videos were chosen). 

Instructions to participants. Participants were informed that the experiment aimed to understand live face processing mechanisms and were instructed according to their role (i.e., Face Watcher or Movie Watcher). The Face Watcher was instructed to look naturally at the face of the Movie Watcher when the smart glass was clear. The Movie Watcher was instructed to look only at the movies and emote naturally. Natural expressions (such as smiles, eye blinks, and other natural non-verbal expressions) were expected due to the emotive qualities of the movies. 

Ratings of emotional experiences. During each 15 s rest epoch, participants used rotating dials to report their subjective emotional experiences in response to stimuli viewed throughout the preceding 15 s task period. Instructions to the participants were to indicate the valence and intensity (Positive emotions: +2 to +5, Neutral or no emotion: 0 to 1, Negative feelings: -2 to -5) of the feelings provoked by their stimulus (either face or movie). The stimuli were different for the two paired participants, but the ratings were based on the same variable: “How does the stimulus make you feel?” Thus, both participants gave the same response so they can be directly compared. The comparison of these affective ratings between dyads documents the extent to which the emotion was communicated via a facial expression on an epoch-by-epoch basis. 

Functional NIRS Signal Acquisition and Channel Localization. Functional NIRS signal acquisition, optode localization, and signal processing, including global mean removal, were similar to methods described previously (Noah, Ono et al. 2015, Zhang, Noah et al. 2016, Noah, Dravida et al. 2017, Piva, Zhang et al. 2017, Zhang, Noah et al. 2017, Dravida, Noah et al. 2018, Hirsch, Noah et al. 2018) and are briefly summarized below. Hemodynamic signals were acquired using 3 wavelengths of light, and an 80-fiber multichannel, continuous-wave fNIRS system (LABNIRS, Shimadzu Corp., Kyoto, Japan). Each participant was fitted with an optode cap with predefined channel distances. Three sizes of caps were used based on the circumference of the participants’ heads’ (60 cm, 56.5 cm, or 54.5 cm). Optode distances of 3 cm were designed for the 60 cm cap but were scaled equally to smaller caps. A lighted fiber-optic probe (Daiso, Hiroshima, Japan) was used to remove all hair from the optode holder before optode placement. 

Optodes consisting of 40 emitters and 40 detectors were arranged in a custom matrix providing a total of 58 acquisition channels per participant. For consistency, the placement of the most anterior midline optode holder on the cap was centered one cm above nasion. To ensure acceptable signal-to-noise ratios, intensity was measured for each channel before recording, and adjustments were made for each channel until all optodes were calibrated and able to sense known quantities of light from each laser wavelength (Tachibana, Noah et al. 2011, Ono, Nomoto et al. 2014, Noah, Ono et al. 2015). Anatomical locations of optodes in relation to standard head landmarks were determined for each participant using a 3D scanner (Occipital Inc., Boulder, CO) and portions of code from the fieldtrip toolbox implemented in Matlab 2022a (Okamoto and Dan 2005, Singh, Okamoto et al. 2005, Eggebrecht, White et al. 2012, Ferradal, Eggebrecht et al. 2014, Homölle and Oostenveld 2019). Optode locations were used to calculate positions of recording channels, and Montreal Neurological Institute (MNI) coordinates (Mazziotta, Toga et al. 2001) for each channel were obtained with NIRS-SPM software (Ye, Tak et al. 2009) and WFU PickAtlas (Maldjian, Laurienti et al. 2003, Maldjian, Laurienti et al. 2004). 

Eye-tracking. Two Tobii Pro x3-120 eye trackers (Tobii Pro, Stockholm, Sweden), one per participant, were used to acquire simultaneous eye-tracking data at a sampling rate of 120 Hz. Eye trackers were mounted on the table facing each participant. Prior to the start of the experiment, a three-point calibration method was used to calibrate the eye tracker on each participant. The partner was instructed to stay still and look straight ahead while the participant was told to look first at the partner’s right eye, then left eye, then the tip of the chin. Eye-tracking data were not acquired on all participants due to technical reasons, mostly associated with loss of the signal for some participants for which the eye-tracking was not sensitive. The eye-tracking served to confirm that there was no eye contact between the face and the movie watchers. This is important because it has been shown that mimicry is modulated by direct gaze (Wang, Ramsey et al. 2011, Wang and Hamilton 2014, de Klerk, Hamilton et al. 2018).

Facial Classification. Facial action units (AUs) were acquired simultaneously from both partners using OpenFace (Baltrušaitis, Robinson et al. 2016). OpenFace provides AUs in both binary format and continuous format. For calculating the correlation of AUs between the two partners, the continuous format was utilized and correlations between partners were taken as representing mimicry of expressions from the Movie Watcher to the Face Watcher.

fNIRS Signal Processing. Raw optical density variations were acquired at three wavelengths of light (780 nm, 805 nm, and 830 nm), which were translated into relative chromophore concentrations using a Beer-Lambert equation (Hazeki and Tamura 1988, Matcher, Elwell et al. 1995, Hoshi 2003). Signals were recorded at 30 Hz. Baseline drift was removed using wavelet detrending provided in NIRS-SPM (Ye, Tak et al. 2009). In accordance with recommendations for best practices using fNIRS data (Yücel, Lühmann et al. 2021), global components attributable to blood pressure and other systemic effects (Tachtsidis and Scholkmann 2016) were removed using a principal component analysis (PCA) spatial global mean filter (Zhang, Noah et al. 2016, Zhang, Noah et al. 2017, Noah, Zhang et al. 2021) before general linear model (GLM) analysis. This study involves emotional expressions that originate from specific muscle movements of the face which may cause artifactual noise in the OxyHb signal. To minimize this potential confound, we utilized the HbDiff signal which combines the OxyHb and deOxyHb signals (see above sections on background and reliability) for all statistical analyses. However, following best practices (Yücel, Lühmann et al. 2021), baseline activity measures of both OxyHb and deOxyHb signals are processed as a confirmatory measure. The HbDiff signal averages are taken as the input to the second level (group) analysis (Tachtsidis, Tisdall et al. 2009). Comparisons between conditions were based on GLM procedures using NIRS-SPM (Ye, Tak et al. 2009). Event epochs within the time series were convolved with the hemodynamic response function provided from SPM8 (Penny, Friston et al. 2011) and fit to the signals, providing individual “beta values” for each participant across conditions. Group results based on these beta values are rendered on a standard MNI brain template (TD-ICBM152 T1 MRI template (Mazziotta, Toga et al. 2001) in SPM8 using NIRS-SPM software with WFU PickAtlas (Maldjian, Laurienti et al. 2003, Maldjian, Laurienti et al. 2004). 

General Linear Model, GLM, analysis. The primary GLM analysis consists of fitting four model regressors (referred to as covariates) to the recorded data. For each 30-second block, there are approximately 15 seconds of task, either movie viewing or face viewing (depending upon the condition), and 15 s of rest. During the 15 s task epochs, visual stimuli were presented to both participants: the Movie Watcher viewed movie clips on a small LCD monitor, and the smart glass was transparent so the Face Watcher could observe the face of the Movie Watcher. For each type of movie, the onsets and durations were used to construct the square wave block design model. The three movie types served as the first three covariates. The fourth model covariate (referred to as Intensity) was a modulated block design created to specifically interrogate the neural responses of the Face Watcher’s brain by both the emotional ratings and facial Action Units of the Movie Watcher. This fourth covariate was specific to the Face Watcher because the details of the Movie Watcher’s face were used as the covariate for the Face Watcher. The data from the Movie Watcher were also analyzed with the first three covariates to determine functional brain activity and emotional effects associated with viewing movie content. 

Usage Notes

All files provided as .csv.


National Institute of Mental Health