Deep multimodal representations and classification of first-episode psychosis via live face processing

Hirsch, Joy 1 ; Sing, Rahul1 ; Zhang, Yanlei2 ; Bhaskar, Dhananjay1 ; Srihari, Vinod1 ; Tek, Cenk1 ; Zhang, Xian1 ; Noah, J. Adam1 ; Krishnaswamy, Smita1

Research facility: Yale School of Medicine

Published Mar 17, 2025 on Dryad. https://doi.org/10.5061/dryad.gxd2547xn

Data files

Mar 17, 2025 version files 2.30 GB

data.tar.gz
2.30 GB
README.md
10.13 KB

Abstract

Schizophrenia is a severe psychiatric disorder associated with a wide range of cognitive and neurophysiological dysfunctions and long-term social difficulties. Early detection is expected to reduce the burden of disease by initiating early treatment. In this paper, we test the hypothesis that the integration of multiple simultaneous acquisitions of neuroimaging, behavioral, and clinical information will be better for the prediction of early psychosis than unimodal recordings. We propose a novel framework to investigate the neural underpinnings of the early psychosis symptoms (that can develop into Schizophrenia with age) using multimodal acquisitions of neural and behavioral recordings including functional near-infrared spectroscopy (fNIRS) and electroencephalography (EEG), and facial features. Our data acquisition paradigm is based on live face-to-face interaction in order to study the neural correlates of social cognition in first-episode psychosis (FEP). We propose a novel deep representation learning framework, Neural-PRISM, for learning joint multimodal compressed representations combining neural as well as behavioral recordings. These learned representations are subsequently used to describe, classify, and predict the severity of early psychosis in patients, as measured by the Positive and Negative Syndrome Scale (PANSS) and Global Assessment of Functioning (GAF) scores to evaluate the impact of symptomatology. We found that incorporating joint multimodal representations from fNIRS and EEG along with behavioral recordings enhances classification between typical controls and FEP individuals (significant improvements between 10 − 20%). Additionally, our results suggest that geometric and topological features such as curvatures and path signatures of the embedded trajectories of brain activity enable the detection of discriminatory neural characteristics in early psychosis.

This readme file was generated on 2025-02-18 by Rahul Singh

GENERAL INFORMATION

Author Information
Name: Rahul Singh
Institution: Yale University
Email: r.singh@yale.edu

Principal Investigator Information
Name: Joy Hirsch
ORCID: 0000-0002-1418-6489
Institution: Yale School of Medicine
Email: joy.hirsch@yale.edu

Principal Investigator Information
Name: Smita Krishnaswamy
Institution: Wu Tsai Institute, Yale University
Email: smita.krishnaswamy@yale.edu

Author/Alternate Contact Information
Name: J. Adam Noah
ORCID: 0000-0001-9773-2790
Institution: Yale School of Medicine
Email: adam.noah@yale.edu

Date of data collection: Approximate collection dates are 2022-01 through 2025-02.

SHARING/ACCESS INFORMATION

Recommended citation for this dataset:
Hirsch, Joy; Sing, Rahul; Zhang, Yanlei et al. (2025). Deep multimodal representations and classification of first-episode psychosis via live face processing [Dataset]. Dryad. https://doi.org/10.5061/dryad.gxd2547xn

DATA & FILE OVERVIEW

For this data set, we have included three files: 1: a data.tar file that includes all raw and exported data collected during the experiment; 2) this README.md file; and 3) FileList.csv, which is a list of files in the zipped data folder.

During data collection, subjects completed data recording on a single visit with a human partner.

File List:

The types of files included are briefly listed below. For full details on names and details specific to each file, see the DATA-SPECIFIC INFORMATION section.

FNIRS data files: csv files containing oxyhemoglobin, deoxyhemoglobin, and total concentration for each channel at each time point. Data were collected with a 6msec sample time per channel.
FNIRS channel location files: csv file containing MNI coordinates for each fNIRS channel.
EEG data files: csv files containing scalp voltage for each channel, at a sampling rate of 256 Hz.
EEG channel location files: csv file containing the MNI coordinates for each EEG channel.
Facial Action Units data files:

File names follow a format indicating subject type, condition, and run. The format for each file name is included in the DATA-SPECIFIC INFORMATION section.

For each visit, there are four conditions consisting of a total of 24 events.
Condition 1 = Direct Gaze and Positive Movie
Condition 2 = Direct Gaze and Negative Movie
Condition 3 = Diverted Gaze and Positive Movie
Condition 4 = Diverted Gaze and Positive Movie

In all cases, the stimulus is a view of the partner’s face, enabled by the smart glass divider between the participant and the partner turning transparent.

DATA-SPECIFIC INFORMATION

FileList.csv is an organizational table indicating which files are present.

Each column corresponds to a subject.
Each row corresponds to a file.
Values in the table are binary indicating the presence or absence of each file for each subject.

For all file names, the following are used as placeholders:

SUBJECTCLASS indicates the type of subject, which can be either a TD (typically developed) individual or a FEP (First Episode Psychosis) patient.
SUBJECTID indicates the subject number. For TD, it ranges from 01 to 31, and for FEP patients it ranges from 01 to 21. Note that the missing numbers indicate that there were excessive motion artifacts and we had to discard the corresponding recordings.
GAZETYPE can be DIRECT or DIVERT indicating direct gaze or diverted gaze
MOVIETYPE can be POSITIVE or NEGATIVE indicating if the stimulus is watching a positively-valenced movie or a negatively-valenced movie.
CONDITION can be 1, 2, 3, or 4, corresponding with DIRECT + POSITIVE, DIRECT + NEGATIVE, DIVERT + POSITIVE, and DIVERT + NEGATIVE conditions, respectively.
RUNID is either the first or second run and can be either 1 or 2.

fNIRS data

The format of the name of fNIRS files:
fnirs_SUBJECTCLASS_SUBJECTID_GAZETYPE_MOVIETYPE_RUNID.csv

A description of the contents of fNIRS files:

Each row is a sample.
Column 1 is time in seconds.
Column 2 is trigger. There are two types of triggers in the files. The main ones relevant to the data set are those with a trigger value >0 and < 3000, which indicates the onset of the stimulus.
Column 3 is empty (no information)
Column 4 is the oxyhemoglobin concentration of ch1.
Column 5 is the de-oxyhemoglobin concentration of ch1.
Column 6 is the total oxyhemoglobin concentration of ch1.
Column 7 is the oxyhemoglobin concentration of ch2.
Column 8 is the de-oxyhemoglobin concentration of ch2.
[…]
The final column is the total oxyhemoglobin concentration of ch134.

Columns indicated by […] continue in the pattern of three columns per channel corresponding to oxyhemoglobin, de-oxyhemoglobin, and total-oxyhemoglobin concentration in that order. There are a total of 134 channels.

The format of the name of fNIRS channel location files:
fnirs_SUBJECTCLASS_SUBJECTID_xyz.csv

A description of the contents of fNIRS channel location files:

Columns 1, 2, and 3 correspond to the MNI X – Y – Z coordinates, respectively.
Rows correspond to 134 channels, 134 rows for 134 channels

EEG data

The format of the name of EEG data files:
EEG_SUBJECTCLASS_SUBJECTID_EEGdata.csv

A description of the contents of EEG data files:

Columns correspond to EEG channels.
Rows correspond to samples, at a sample rate of 256hz.

The format of the name of EEG channel location files:
EEG_SUBJECTCLASS_SUBJECTID_EEGxyz.csv

A description of the contents of EEG channel location files:

Columns 1, 2, and 3 correspond with MNI X – Y – Z coordinates, respectively.
Rows correspond with channels. Channel names are, in row order: fp1, fp2, af3, af4, f7, f3, fz, f4, f8, fc5, fc1, fc2, fc6, t7, c3, cz, c4, t8, cp5, cp1, cp2, cp6, p7, p3, pz, p4, p8, po3, po4, o1, oz, and o2.

The format of the name of the EEG event file:
EEG_SUBJECTCLASS_SUBJECTID_EEGevent.csv

A description of the contents of EEG event files:

Rows correspond to each event.
Column 1 is the sample number (EEG sampling rate is 256 Hz)
Column 2 is the condition, using the following legend.
1 = Direct Gaze and Positive Movie
2 = Direct Gaze and Negative Movie
3 = Diverted Gaze and Positive Movie
4 = Diverted Gaze and Positive Movie

Facial Action Units data

The format of the name of the eye tracking data file:
openface_SUBJECTCLASS_SUBJECTID_GAZETYPE_MOVIETYPE_RUNID.csv

A description of the contents of openface files:

Rows correspond to samples
Please see the first row for the facial landmark locations and action units.

References

Delorme, A., & Makeig, S. (2004). EEGLAB: an open-source toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of Neuroscience Methods, 134(1), 9–21.
Dravida, S., Noah, J. A., Zhang, X., & Hirsch, J. (2020). Joint Attention During Live Person-to-Person Contact Activates rTPJ, Including a Sub-Component Associated With Spontaneous Eye-to-Eye Contact. Frontiers in Human Neuroscience, 14. https://doi.org/10.3389/fnhum.2020.00201
Eggebrecht, A. T., White, B. R., Ferradal, S. L., Chen, C., Zhan, Y., Snyder, A. Z., Dehghani, H., & Culver, J. P. (2012). A quantitative spatial comparison of high-density diffuse optical tomography and fmri cortical mapping. Neuroimage, 61(4), 1120–1128. https://doi.org/10.1016/j.neuroimage.2012.01.124
Hirsch, J., Zhang, X., Noah, J. A., Dravida, S., Naples, A., Tiede, M., Wolf, J. M., & McPartland, J. C. (2022). Neural correlates of eye contact and social function in autism spectrum disorder. PLOS ONE, 17(11), e0265798. https://doi.org/10.1371/journal.pone.0265798
Hirsch, J., Zhang, X., Noah, J. A., & Ono, Y. (2017). Frontal, temporal, and parietal systems synchronize within and across brains during live eye-to-eye contact. NeuroImage, 157, 314–330. https://doi.org/10.1016/j.neuroimage.2017.06.018
Kelley, M., Noah, J. A., Zhang, X., Scassellati, B., & Hirsch, J. (2021). Comparison of human social brain activity during eye-contact with another human and a humanoid robot. Frontiers in Robotics and AI.
Noah, J. A., Ono, Y., Nomoto, Y., Shimada, S., Tachibana, A., Zhang, X., Bronner, S., & Hirsch, J. (2015). fMRI Validation of fNIRS Measurements During a Naturalistic Task. Journal of Visualized Experiments : JoVE, 100. https://doi.org/10.3791/52116
Noah, J. A., Zhang, X., Dravida, S., Ono, Y., Naples, A., McPartland, J. C., & Hirsch, J. (2020). Real-time eye-to-eye contact is associated with cross-brain neural coupling in angular gyrus. Frontiers in Human Neuroscience, 14, 19. https://doi.org/10.3389/fnhum.2020.00019
Okamoto, M., & Dan, I. (2005). Automated cortical projection of head-surface locations for transcranial functional brain mapping. Neuroimage, 26(1), 18–28. https://doi.org/10.1016/j.neuroimage.2005.01.018
Parker, T. C., Zhang, X., Noah, J. A., Tiede, M., Scassellati, B., Kelley, M., McPartland, J., & Hirsch, J. (2023). Neural and visual processing of social gaze cueing in typical and ASD adults. medRxiv, 2023.01. 30.23284243.
Ye, J. C., Tak, S., Jang, K. E., Jung, J., & Jang, J. (2009). NIRS-SPM: Statistical parametric mapping for near-infrared spectroscopy. NeuroImage, 44(2), 428–447. https://doi.org/10.1016/j.neuroimage.2008.08.036
Baltrušaitis, Tadas, Peter Robinson, and Louis-Philippe Morency. “Openface: an open source facial behavior analysis toolkit.” 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, 2016.

The proposed method employs dyads that include one individual who serves as the live expressive face stimulus and the other partner categorized as either typically developed (TD) or first episode psychosis (FEP) patient. Dyads faced each from across a table at a distance of approximately 140 cm and table-mounted eye-tracking systems were positioned to measure continuous eye movements of the subject. Functional NIRS and EEG data were also synchronized and continuously acquired hemodynamic and electrocortical responses of the subject during the experiment. The dyads were separated by a “smart glass” in the center of the table that controlled face gaze times (the glass was transparent during gaze periods) and “rest times” (the glass was opaque during rest) (Hirsch, X. Zhang, Noah, and Bhattacharya, 2023).

Paradigm

The dyads were seated 140 cm across a table from each other. A "Smart Glass" (glass that is capable of alternating its appearance between opaque and transparent upon application of an appropriate voltage) panel was positioned in the middle of the table 70 cm away from each participant. In both conditions of direct and diverted face gaze, the subject was instructed to gaze at the eyes of their partner who watches emotionally balanced movie clips followed by direct or diverted gaze towards the subject's face. In the direct face gaze condition, dyads had a direct face-to-face view of each other. On the other hand, in the diverted face gaze condition the stimulus looks at the subject's shoulder.

The actor watches a 4-second movie (joyful or sad) and then looks at the partner's (subject's) eyes or his shoulders (diverted face gaze) for 5 seconds. This sequence of tasks was repeated twice for each pair. Then there is a 12-second rest period when the smart glass is made opaque. The same process (30 seconds) is repeated three times for each condition. The subjects were instructed to watch the actor's (stimulus) face all the time. The actor was instructed to watch short movies followed by direct or diverted gaze towards the subject.

FNIRS, EEG, and Facial Action Units data collection

Data were collected via simultaneous fNIRS, EEG, and Facial Action Units while individuals viewed another human partner’s face.
fNIRS Data were collected via a multichannel continuous-wave system (LABNIRS, Shimadzu Corporation, Kyoto, Japan) consisting of forty emitter-detector optode pairs. During the task, optodes were connected to a cap placed on the participant’s head based on size to fit comfortably. For consistency in cortical coverage, the middle anterior optode was placed 1 cm above the nasion; the middle posterior optode was placed in line with the inion; and the CZ optode was aligned with the anatomical CZ. After cap placement, hair was cleared from optode holders using a lighted fiber-optic probe (Daiso, Hiroshima, Japan) prior to optode placement. Optodes were arranged in a matrix, contacting the scalp, enabling the acquisition of 128 channels. After optode placement and prior to beginning the experiment, the signal-to-noise ratio was assessed by measuring the attenuation of light for each channel, with adjustments made as needed(Noah et al., 2015; Tachibana et al., 2011).

FNIRS signal acquisition, optode localization, and signal processing were similar to methods described previously (Dravida et al., 2020; Hirsch et al., 2017, 2022; Kelley et al., 2021; Noah et al., 2020).

Eye movements were recorded using a desk-mounted Tobii Pro (Stockholm, Sweden) X3-120 eye-tracking system placed 70cm in front and slightly below the participant’s face. Eye behavior was recorded at 120 Hz. The Eye tracker was calibrated for each participant using a transparent plane with three dots placed around the face of the partner. Participants were instructed to look at each dot in turn, and each gaze angle was recorded. Calibration was confirmed by having participants look at each eye and the nose of the partner and confirming alignment. Synchronized scene video capturing the participant’s view of the partner was recorded at 30 Hz with a resolution of 1280x720 pixels using a Logitech c920 camera (Lausanne, Switzerland) positioned directly behind and above the participant's head. This enabled tagging of participant-looking behavior within a manually placed “face box.”
EEG data were acquired via a 256-Hz, 32-electrode dual-bio amplifer g.USB multi-amp system (g.tec Medical Engineering, Austria). The electrode layout was adapted from the 10-10 system to accommodate optode placement on the fNIRS cap. Saline conducting gel was manually placed for each electrode after optode placement to ensure scalp contact. Scalp contact was manually reviewed per electrode using a digital oscilloscope, and adjustments were made as needed.

Recording of optode locations

After completion of the tasks, the locations of optodes and electrodes were recorded for each participant using the Structure Sensor scanner (Boulder, CO, USA) which creates a 3D model (.obj file) of the participant’s head and cap. Locations of the standard anatomical landmarks nasion, inion, cz, t3, and t4 as well as optode locations were manually placed on the 3D model using either MATLAB. Electrode locations were then determined by calculating the midpoint between surrounding optodes. Locations were then corrected for cap drift using custom MATLAB scripts which rotated optode and electrode locations around the Montreal Neurological Institute (MNI) X-axis from the left ear towards the midline (Eggebrecht et al., 2012; Okamoto & Dan, 2005). This was done to bring the cz optode in line with the anatomical cz according to the original placement to account for the stereotyped tilting of the cap towards the left ear that could occur during optode removal.

Environmental/experimental conditions: During the experiment, the overhead lights of the room were extinguished. An experimenter was present out of the view of the participant during each run. Two directed lights were used to fully illuminate the partner’s face and eliminate shadows. Two diffuse bar lights were used to softly illuminate the opaque SmartGlass during no-face-viewing periods to suppress the participant’s reflection. Participants only reported being able to see the rough outline of their heads in the SmartGlass, with no internal features visible. These were on throughout run durations.

Data included in the .tar file are raw, unprocessed data.

Data is presented in a generic text file format, making them broadly accessible for analysis in a wide array of software. The software utilized by the lab is described here. fNIRS data were collected via a multichannel continuous-wave LABNIRS system, producing OMM files and converted to text files which can be analyzed using MathWorks MATLAB with the NIRS-SPM(Ye et al., 2009) package. EEG data were collected with a 256-Hz, 32-electrode dual-bio amplifier g.USB Amp system (G. Tec Medical Engineering, Austria) and are analyzable through MathWorks MATLAB with the EEGLAB extension(Swartz Center for Computational Neuroscience, California, USA)(Delorme & Makeig, 2004). Facial Action Unit data were obtained using OpenFace software (Baltrušaitis et. al).

During the experimental setup, optode holders were cleared of hair using a lighted fiberoptic wand. This ensured that scalp contact was made. After fNIRS optode placement and prior to beginning the experiment, the signal-to-noise ratio was assessed by measuring the attenuation of light for each channel, with manual adjustments made as needed. After optode placement, saline conducting gel was placed in EEG electrodes manually to ensure scalp contact. Scalp contact was reviewed using an oscilloscope, and adjustments to scalp connectivity were made as needed. Eye tracking calibration was completed using Tobii Pro Lab’s calibration protocol, with a vertical plane placed in alignment with the end of the partner's nose which placed three calibration points around the edge of the partner's face which participants were instructed to view in time.

Quality-assurance procedures performed on the data: Optode and electrode connectivity were reviewed and adjusted prior to starting the experiment and were further monitored over the course of the experiment so that connectivity issues could be addressed between runs if needed. After data collection, EEG data were manually reviewed by eye and bad channels were removed and replaced with an interpolation calculated from the remaining channels. Channel 32 was removed from all subjects due to stereotyped connectivity issues from the cap design.

People involved with sample collection, processing, analysis, and/or submission: Dr. J. Adam Noah was responsible for the maintenance of the EEG, fNIRS, and eye-tracking hardware and software. He also assisted in the design and implementation of the experiment, as well as data collection and analyses. Dr. Xian Zhang assisted in data analysis and software upkeep, as well as produced the code responsible for running the experiment and synchronizing the modalities. Dr. Rahul Singh participated in data collection, conducted analyses on the data, and produced this document. Dr. Joy Hirsch oversaw the project.