Neural tracking measures of speech intelligibility: Manipulating intelligibility while keeping acoustics unchanged
Data files
Nov 18, 2023 version files 95.67 GB
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography (MEG) recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (non-degraded) version of the speech. This intermediate priming, which generates a ‘pop-out’ percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affects acoustic and linguistic neural representations using multivariate Temporal Response Functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. TRF analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming, but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex (PFC), in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
README: Neural tracking measures of speech intelligibility: Manipulating intelligibility while keeping acoustics unchanged
https://doi.org/10.5061/dryad.sbcc2frd6
The dataset includes raw MEG (magnetoencephalography) data, behavioral responses, stimuli, predictors, main codes, some intermediate results (Temporal response functions (TRFs), features extracted from TRFs), and statistical analysis codes.
Description of the data and file structure
Important specific python packages are - eelbrain, mne, and trftools
1. meg_control.zip - Raw MEG data (.fiff) and empty room data (.fiff) for noise covariance, and transformation matrix for subjects in the control study
2. meg_main1.zip, meg_main2.zip - Raw MEG data (.fiff) and empty room data (.fiff) for noise covariance, and transformation matrix for subjects in the main study
The .fiff files are data recorded from MEG kit at the University of Maryland College Park (https://linguistics.umd.edu/resources-facilities/labs/KIT-Maryland-MEG-Lab). The data can be viewed in python using mne python package (https://mne.tools/stable/generated/mne.io.show_fiff.html)
3. mri.zip - Co-registered (fs average mri) data
The folders include coregistered data using freesurfer fsaverage brain and individual subject's digitized head data. This data is used for MEG source localization. Files include information related to scaled MRI.
bem - the bem (boundary element method) surfaces in .fif format including the ico-4 source space
label - label annotations for brain parcellations
mri - scaled mri related to freesurfer fsaverage
surf - brain surfaces
4. predictors.zip - predictors (envelope, envelope onset, phoneme onsets, word onsets, and GPT2) used in TRF modeling
5. stimuli.zip - Stimuli (.wav files) used in the study.
6. TRFs.zip - Estimated TRFs for different models using boosting algorithm (https://eelbrain.readthedocs.io/en/stable/generated/eelbrain.boosting.html). The .pkl files include the boosting result for each subject. The file naming format is '{subjects}{session}{tstart}{tstop}{partitions}{basis}{test}_trfs{randomization}.pkl', where tstart, tsop, partitions, basis and test are parameters in the boosting algorithm.
7. Codes.zip - Main codes (experiment_class using Eelbrain (https://eelbrain.readthedocs.io/en/stable/reference.html#module-pipeline) (exp.py), and jupyter notebook (results1.ipynb) for TRF computation, prediction accuracy comparisons, and TRF peak picking, and python codes for gammatone envelope extraction (make_gammatone.py and make_gammatonepredictor.py))
make_gammatone.py and make_gammatonepredictor.py are used to generate speech features for stimuli in the stimuli.zip and does not depend on other meg data or codes.
exp.py - sets up the experiment pipeline for this specific dataset along with other pre-processing options, TRF modeling, epoch extraction and parcellation.
results1.ipynb - Once the exp.py is setup properly, results1.ipynb can be run. results1.ipynb depends on the variable definitions in the exp.py and gives example codes for TRF estimation, TRF peak extraction, and prediction accuracy comparisons.
8. statistical.zip - R codes used for statistical analysis
Behavioral.Rmd - Statistical analysis corresponding to behavioral clarity ratings
TRFs.Rmd - Statistical analysis corresponding to TRF peak comparisons
9. behavioral.zip - Behavioral responses and TRF peaks.
Ratings_control.xlsx and Ratings_main.xlsx - At the end of each noise-vocoded passage, participants were asked to rate the perceived speech clarity (“How much could you follow the passage on a scale of 0 – 5?”; 0 - no words, 1 – a few words, 2 - definitely some words, 3 - lots of words but not most, 4 - more than half of all words, 5 - almost all words) and these information for each subject in control experiment and main experiment are saved in Ratings_control.xlsx and Ratings_main.xlsx respectively.
Demographic.xlsx - Demographic information (age, handedness, and gender) related to subject ids
.csv files - Includes the features extracted from Temporal Response Functions (TRFs) (P1 : early positive polarity peak and N1 : Late negative polarity peak, peak amplitudes) for each subject, brain hemisphere, speech condition, and speech feature.
Methods
Magnetoencephalography (MEG) data were recorded from young adult participants as they listened to a passage of noise-vocoded speech, first before any priming, followed by listening to the original, non-degraded version of the same passage to invoke priming, and then finally listening to the same noise-vocoded speech passage as before. All information related to experimental procedure, stimuli, and preprocessing are described in the paper.