Separable processes for live “in-person” and “zoom-like” faces

Published Jul 09, 2024 on Dryad. https://doi.org/10.5061/dryad.zpc866tct

Abstract

Increased reliance on Zoom-like (webcam) platforms for interpersonal communications has raised the question of how this new virtual format compares to real face-to-face interactions. This question is also relevant to current models of face processing. Neural coding of simulated faces engages feature-selective processes in the ventral visual stream and two-person live face-to-face interactions engage additional face processes in the lateral and dorsal visual streams. However, it is not known if and/or how live in-person face processes differ from live virtual face processes because the faces and tasks are essentially the same. Current views of functional specificity predict no neural difference between the virtual and live conditions. Here we compare the same live faces viewed both over a video format and in person with measures of functional near-infrared spectroscopy (fNIRS), eye tracking, pupillometry, and electroencephalography (EEG). Neural activity was increased in dorsal regions for in-person face gaze and was increased in ventral regions for virtual face gaze. Longer dwell times on the face, increased arousal indexed by pupil diameter, increased neural oscillation power in the theta band, and increased cross-brain coherence were also observed for the in-person face condition. These findings highlight the fundamental importance of real faces and natural interactions for models of face processing.

Included in the compressed folder ("AllData.zip.00X") are the data collected from typically developing participants engaged in three face-viewing tasks. The tasks include a live, face-to-face interaction with a partner, a webcam-based interaction with the same partner, similar to Zoom, Skype, or Teams web conferencing, and a third interaction that shows the participants their face on the monitor in front of them. The data include functional near-infrared spectroscopic (FNIRS) recorded with a Shimadzu LABNIRS, and EEG data (g.tec g.USB) recorded from both partners during the interactive tasks. The zipped folder also included the 3D localizer information for each participant. The zipped folder is separated into 100MB file sizes for up and downloading purposes.

Description of the Data and file structure

All data is made available in CSV text format and can be opened with any text-reading application.

The data file includes the following file structures: Two folders: EEGdata and NIRData.

The attached datatable.csv file contains the list of the file names for the NIRS and EEG Data for each pair of participants. Column 1 lists the pair numbers (1-14). Column 2 has the type of interaction: "zoomface" is the interaction with one's face, "skype" is the webconference with partner type interaction, and face-to-face is the live face-to-face interaction. The next 2 columns in the file indicate the names of the NIRS data files from run 1 and run 2 of each interaction. There were 2 runs for each interaction per pair of participants. The next two columns indicate the names of the files containing EEG data from each participant. The first column is participant 1 and the 2nd column is participant 2. Each file has two runs included.

In the root of each "XXX"Data folder, there are P01-P14 subfolders containing the data (NIRS and EEG) for each pair of participants (PXX). In each PXX subfolder in the NirData folder, there are 6 files with the Raw NIRS data converted to plain text format in a CSV structure and 4 files that contain the optode location information. The location information files are named with either A or B to indicate which participant (1 or 2) and "origin" (fiducial information) and "others" to indicate the position of each emitter and detector optode.

The EEGdata folder replicates the PXXX subfolder structure with 6 files in each subfolder.

EEG location information is as follows. There are 32 electrodes on each participant from 1 to 32 the locations are as follows: Fp1 Fp2 AF3 AF4 F7 F3 Fz F4 F8 FC5 FC1 FC2 FC6 T7 C3 Cz C4 T8 CP5 CP1 CP2 CP6 P7 P3 Pz P4 P8 PO3 PO4 O1 Oz O2.

There is some missing data or noisy data which is described for each SXX below. If there is no description here, that means the dataset was complete.

Missing data:

for Pair 1 the 2nd run of the "skype" interaction was not recorded
Pair 3 is missing all EEG data due to a battery failure in the preamp.

Sharing/Access Information

Participants

Sample size was determined by a power analysis based on prior face gaze experiments (Noah, et al, 2020) where peak brain activations between task and rest in the rTPJ were 0.00055 + 0.0003 and the distance (signal difference/standard deviation) was 0.534. Using the “pwr” package of R statistical software (Champely, 2020) at a significance of p < 0.05 the sample must include 23 participants to ensure the conventional power of 0.80. Our sample size of 28 meets and exceeds that standard.

All participants provided written informed consent in accordance with guidelines approved by the Yale University Human Investigation Committee (HIC # 1501015178). Dyads were assigned in order of recruitment, and participants were either strangers before the experiment or casually acquainted. Participants were not stratified further by affiliation or dyad gender mix. Six pairs were mixed gender, six pairs were female-female, and two pairs were male-male.

Paradigm

Each dyad participated in two tasks in which they were seated 140 cm across a table from each other. In both tasks, dyads were instructed to gaze at the eyes of their partner (Figure 1). In the In-person condition, dyads had a direct face-to-face view of each other. A panel of smart glass (glass that is capable of alternating its appearance between opaque and transparent upon application of an appropriate voltage) was positioned in the middle of the table 70 cm away from each participant (Figure 1A). In the Virtual Face condition, each dyad watched their partner’s faces projected in real-time on separate 24-inch 16 × 9 computer monitors placed in front of the glass (Figure 1B). The in-person and virtual conditions were performed in the same location by the same dyads (see illustration in Fig 1A and B) to avoid questions regarding whether the virtual partner was real or not. Participants were instructed to minimize head movements, remain as still as possible during the task by avoiding large motions, and maintain facial expressions that were as neutral as possible. The time series (Figure 1C) and experimental details are similar to previous studies (Hirsch et al., 2017; Noah et al., 2020). At the start of a block, prompted by an auditory beep, dyads fixated on a crosshair located in the center of the monitor in the Virtual Face condition or in the center of the opaque smart glass in the In-person condition. The face of the Virtual partner was visual-angle corrected to the same size as the In-person Face (Figure 1B). The auditory tone also cued viewing the crosshair during the rest/baseline condition according to the protocol time series (Figure 1C).

Six 15-second (s) active task periods alternated with a 15-second rest/baseline period for a total of 3 minutes per run. The task period consisted of three 6 s cycles in which face presentation alternated “on” for 3 s and “off” for 3 s for each of the three events (Figure 1C). The smart glass became transparent during the “on” period and opaque during the “off” and rest periods. The time series was performed in the same way for all conditions. During the 15 s rest/baseline period, participants focused on the fixation crosshair, as in the case of the 3 s “off” periods that separated the eye contact and gaze events and were instructed to “clear their minds” during this break. The 3 s time “on” period was selected due to increasing discomfort when maintaining eye contact with a live partner for periods longer than that (Hirsch et al., 2017; Noah et al., 2020). Each 3-minute run was repeated twice. The whole paradigm lasted 18 minutes. Stimulus presentation, eye-tracking data acquisition, fNIRS signal acquisition, and EEG signal acquisition were synchronized using TTL and UDP triggers (details below) that were sent to all machines simultaneously.

Data Acquisition

Eye Tracking. Eye tracking data were acquired using two Tobii Pro x3-120 eye trackers (Tobii Pro, Stockholm, Sweden), one per participant, at a sampling rate of 120 Hz. In the In-person condition, eye trackers were mounted on the smart glass facing each participant. Calibration was performed using three points on their partner’s face prior to the start of the experiment. The partner was instructed to stay still and look straight ahead while the participant was told to look first at the partner’s right eye, then left eye, then the tip of the chin. In the Virtual Face condition, eye trackers were mounted on the lower edge of the computer monitor facing each participant, and the same three-point calibration approach was applied using the partner’s face displayed on the computer monitor via webcam.

Tobii Pro Lab software (Tobii Pro, Stockholm, Sweden) and OpenFace (Baltrušaitis et al., 2016) were used to create areas of interest for subsequent eye-tracking analyses performed in MATLAB 2019a (Mathworks, Natick, MA). UDP signals were used to synchronize the triggers from the stimulus presentation program to a custom virtual keyboard interpretation tool written in Python and sent to the Tobii Pro Lab software. When a face-watching trial started and ended, UDP triggers were sent via Ethernet from the paradigm computer to the eye-tracking computers, and the virtual keyboard “typed” a letter that marked the events in the eye-tracking data recorded in Tobii Pro Lab subsequently used to delimit face-watching intervals.

Pupillometry. Pupil diameter measures were acquired using the Tobii Pro Lab software and post-processing triggers to partition time sequences into face-watching intervals. Left and right pupil diameters were averaged for each frame and interpolated to 120 Hz as gaze position sampling.

Electroencephalography (EEG). A g.USBamp (g.tec medical engineering GmbH, Austria) system with two bio-amplifiers and 32 electrodes per subject were used to collect EEG data at a sampling rate of 256 Hz. Electrodes were arranged in a layout similar to the 10-10 system; however, exact positioning was limited by the location of the electrode holders, which were held rigid between the optode holders. Electrodes were placed as closely as possible to the following positions: Fp1, Fp2, AF3, AF4, F7, F3, Fz, F4, F8, PC5, PC1, PC2, PC6, T7, C3, Cz, C4, T8, CP5, CP1, CP2, CP6, P7, P3, Pz, P4, P8, PO3, PO4, O1, Oz, and O2. Conductive gel was applied to each electrode to reduce resistance by ensuring contact between the electrodes and the scalp. As gel was applied, data were visualized using a bandpass filter to allow frequencies between 1 and 60 Hz. The ground electrode was placed on the forehead between AF3 and AF4 and an ear clip was used for reference..

Functional Near-Infrared Spectroscopy (fNIRS). A Shimadzu LABNIRS system (Shimadzu Corp., Kyoto, Japan) was used to collect fNIRS data at a sampling rate of 123 ms. Each emitter transmitted three wavelengths of light, 780, 805, and 830 nm, and each detector measured the amount of light that was not absorbed. The amount of light absorbed by the blood was converted to concentrations of OxyHb and deOxyHb using the Beer-Lambert equation. Custom-made caps with interspersed optodes and electrode holders were used to acquire concurrent fNIRS and EEG signals (Shimadzu Corp., Kyoto, Japan). The distance between optodes was 2.75 cm or 3 cm, respectively, for participants with head circumferences less than 56.5 cm or greater than 56.5 cm. Caps were placed such that the most anterior midline optode holder was ≈2.0 cm above nasion, and the most posterior and inferior midline optode holder was on or below inion. Optodes consisting of 40 emitters and 40 detectors were placed on each participant to cover bilateral frontal, temporal, and parietal areas (Figure 1D), providing a total of 60 acquisition channels per participant. A lighted fiber-optic probe (Daiso, Hiroshima, Japan) was used to remove hair from the optode channel before optodes were placed. To ensure acceptable signal-to-noise ratios, resistance was measured for each channel before recording. Adjustments were made until all optodes were calibrated and able to sense known quantities of light from each laser wavelength (Noah et al., 2015; Ono et al., 2014; Tachibana et al., 2011).

After the experiment, a Polhemus Patriot digitizer (Polhemus, Colchester, Vermont) was used to record the position of EEG electrodes and fNIRS optodes, as well as five anatomical locations (nasion, inion, Cz, left tragus, and right tragus) for each participant (Eggebrecht et al., 2014; Eggebrecht et al., 2012; Ferradal et al., 2014; Okamoto & Dan, 2005; Singh et al., 2005). Montreal Neurological Institute (MNI) coordinates (Mazziotta et al., 2001) for each channel were obtained using NIRS-SPM software (Ye et al., 2009). Anatomical correlates were estimated with the TD-ICBM152 atlas using WFU PickAtlas (Maldjian et al., 2004; Maldjian et al., 2003).

Data Analysis

Signal processing of eye tracking data and calculation of duration of gaze on faces. Eye tracking data were exported from the Tobii Pro Lab software to the data processing pipeline and custom scripts in MATLAB were used to calculate the duration of gaze on faces, variability of gaze, and pupil diameter. OpenFace (Baltrušaitis et al., 2016) was used to generate the convex hull of an ‘average face’ using 16 (8 pairs) of the individual OpenFace results from the Tobii videos to partition gaze directed at the face or not.

Statistical analysis of eye contact. The eye gaze task alternated between eye gaze (participants were expected to fixate on either the eyes of their partner’s virtual face or the real eyes of their live partner) and rest (participants were expected to fixate on either the crosshair on the computer monitor (Virtual Face condition) or a red dot on the smart glass (In-person condition). The eye gaze portions of the task were 3 s in length, with 6 per trial, for 18 s of expected eye contact over the trial duration (Figure 1C). Usable eye-tracking data were acquired for 18 participants (9 dyads). To avoid possible transition effects caused by shifting eye gaze between stimuli (partner’s eyes) and fixation, the initial 1000 ms of each eye gaze trial were excluded from analysis. Samples marked by Tobii as “invalid” and samples outside of the polygon defined by the average “face” by OpenFace were also discarded. Measures derived for each trial included Dwell Time (DT), computed as the number of retained samples over the gaze interval normalized by sampling rate (seconds), which represents the duration of gaze contacts on either the virtual face or the face of the live partner. To measure the variability of the gaze on partner’s face, standard deviations were calculated by computing the log horizontal (HSD) and vertical (VSD) deviations from the mean-centered samples of each gaze interval normalized by the number of retained samples. Pupil diameter over face-watching intervals was z-scored by participant (PDZ). Linear mixed-effects models (Bates et al., 2007) were fitted in R (R Core Team, 2018) on DT, HSD, VSD, and PDZ separately.

Electroencephalography (EEG). EEG signals were preprocessed using EEGLAB v13.5.4b in MATLAB 2014a (Mathworks, Natick, Massachusetts). EEG was digitized at a sampling rate of 256 Hz. MATLAB was used to filter the data with a bandwidth of 1-50 Hz for each participant. Two types of channels exhibiting noise characteristic of poor contact with the scalp were rejected based on visual inspection: (1) signals with amplitude exceeding 100 μV, and (2) signals that were completely flat with low-frequency drift. With these criteria, an average of 3 channels per person were removed, and signals from the surrounding channels were interpolated. A common average reference was computed using the 32 data channels and averaged to produce one epoch data file per condition with -100 to 3000 ms epochs, where the 0 ms point is locked to face presentation (In-person Face vs. Virtual Face). The 100 ms before task onset served as a baseline. These files were manually inspected for epochs containing eye movements and blinks, which were discarded from further analysis. Wavelet decomposition algorithms were applied to EEG signals within the first 250ms to calculate the EEG power in the following frequency bands: theta (4-8 Hz), alpha (8-13 Hz), and beta (13-30 Hz). T-tests (Virtual Face vs. In-person Face) were conducted on each frequency band.

Functional Near-Infrared Spectroscopy (fNIRS). The analysis methods used here have been described previously (Dravida et al., 2018; Hirsch et al., 2018; Noah et al., 2017; Noah et al., 2015; Piva et al., 2017; Zhang et al., 2017; Zhang et al., 2016) and are briefly summarized below. First, wavelet detrending was applied to the combined (Hb Diff) hemoglobin signal (the sum of the oxyhemoglobin and the inverted deoxyhemoglobin signals, HbDiff) (Tachtsidis et al., 2009) to remove baseline drift using the algorithm provided by NIRS-SPM (Ye et al., 2009). The combined OxyHb and deOxyHb signals are reported here representing the most comprehensive measurement. However, consistent with best practices for fNIRS data (Yücel et al., 2021), results from the separate signals are included in Supplementary Figures S1-S4 and Tables S3-S6. Results are generally comparable to those reported here, although reduced activity is apparent in the deOxyHb analyses due to expected factors such as noise and relative difficulty with signal detection. Second, noisy channels were removed automatically if the root mean square of the signal was more than 10 times the average for that participant. A principal component analysis spatial filter was used to remove global components caused by systemic effects assumed to be non-neural in origin (Zhang et al., 2017; Zhang et al., 2020; Zhang et al., 2016). For each run, a general linear model (GLM) computed by convolving the eye gaze task paradigm (Figure 1C) with a canonical hemodynamic response function was used to generate beta values for each channel. Group results based on these beta values were rendered on a standard MNI brain template (Figure 5). Second-level analyses were performed using t-tests in SPM8. Anatomical correlates were estimated with the TD-ICBM152 T1 brain atlas using WFU PickAtlas (Maldjian et al., 2004; Maldjian et al., 2003).

Wavelet Coherence. Coherence analyses were performed on the combined signals. Details on this method have been validated (Zhang et al., 2020) and applied to prior two-person interactive investigations (Hirsch et al., 2018; Hirsch et al., 2017; Piva et al., 2017). Briefly, channels were grouped into 12 anatomical regions and wavelet coherence analysis was evaluated between all groups across participants in a pair exhaustively. The wavelet coherence analysis decomposes time-varying signals into their frequency components. Here, the wavelet kernel used was a complex Gaussian (“Cgau2”) provided in MATLAB. The residual signal from the entire data trace was used, with the activity due to the task removed, similar to traditional PPI analysis (Friston et al., 1997). Sixteen scales were used, and the range of frequencies was 0.1 to 0.025 Hz. Based on prior work, we restricted the wavelengths used to those that reflect fluctuations in the range of the hemodynamic response function. Coherence results in a range higher than 0.1 Hz due to non-neural physiologic components (Nozawa et al., 2016; Zhang et al., 2020). Therefore, 11 wavelengths were used for the analysis. Complex coherence values were averaged following previously established methods (Zhang et al., 2020).

Cross-brain coherence is the correlation between the corresponding frequency components across interacting partners, averaged across all time points and represented as a function of the wavelength of the frequency components (Hirsch et al., 2018; Hirsch et al., 2017; Noah et al., 2020; Zhang et al., 2020). The difference in coherence between the In-person Face and Virtual Face conditions for dyads was measured using t-tests for each frequency component. Only wavelengths shorter than 30 seconds were considered as the experimental cycle between task and rest was 30 seconds. An analysis of shuffled pairs of participants was conducted to confirm that the reported coherence was specific to the pair interaction and not due to engagement in a similar task. The coherence analysis was a region of interest analysis targeting somatosensory and somatosensory association cortices in the dorsal visual stream (Figs 5A and B).

Separable processes for live “in-person” and “zoom-like” faces

Data files

Abstract

README: Separable processes for live “in-person” and “zoom-like” faces

Description of the Data and file structure

Sharing/Access Information

Methods

Works referencing this dataset