Recognizing sounds implicates the cerebral transformation of input waveforms into semantic representations. Although past research identified the superior temporal gyrus (STG) as a crucial cortical region, the computational fingerprint of these cerebral transformations remains poorly characterized. Here, we exploit a model-comparison framework and contrasted the ability of acoustic, semantic (continuous and categorical), and sound-to-event deep neural network (DNN) representation models to predict perceived sound dissimilarity and 7 Tesla human auditory cortex fMRI responses. We confirm that spectrotemporal modulations predict early auditory cortex (Heschl’s gyrus) responses, and that auditory dimensions (e.g., loudness, periodicity) predict STG responses and perceived dissimilarity. Sound-to-event DNNs predict HG responses similar to acoustic models but, notably, they outperform all competing models at predicting both STG responses and perceived dissimilarity. Our findings indicate that STG entails intermediate acoustic-to-semantic sound representations that neither acoustic nor semantic models can account for. These representations are compositional in nature and relevant to behaviour.

This repository includes data, analysis code and results for the following paper:

Intermediate acoustic-to-semantic representations link behavioural and neural responses to natural sounds

Bruno L. Giordano^1*, Michele Esposito², Giancarlo Valente² and Elia Formisano^2,3,4*

1 Institut des Neurosciences de La Timone, UMR 7289, CNRS and Université Aix-Marseille, Marseille, France.

2 Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands.

3 Maastricht Centre for Systems Biology (MaCSBio), Faculty of Science and Engineering, Maastricht University

4 Brightlands Institute for Smart Society (BISS), Maastricht University

*Corresponding authors.

E-mails: bruno dot giordano at univ-amu dot fr;

e dot formisano at maastrichtuniversity dot nl

In this paper, we re-analyse behavioural data from Giordano et al. (2010; perceived natural sound and word dissimilarity), and Santoro et al. (2017; 7T fMRI responses to natural sounds).

References:

- Giordano, B. L., McDonnell, J. & McAdams, S. Hearing living symbols and nonliving icons: Category-specificities in the cognitive processing of environmental sounds. Brain Cogn 73, 7–19 (2010).

- Santoro, R. et al. Reconstructing the spectrotemporal modulations of real-life sounds from fMRI response patterns. PNAS 114, 4799–4804 (2017).

## Repo structure

* Install.m: Matlab script called inside the analysis code to install toolboxes and declare relevant paths.

* README_1st.txt: installation information (also included in this README.md)

* /code/: code used to fit the models to the stimuli and analyze the data.

Main analysis code:

* analyze_01_acoustic_models_distances.m (Matlab): fits acoustics models to sound stimuli and computes between-stimulus distances

* analyze_02_nlp_models.py (Python): computes natural language processing embeddings for the labels describing the sound stimuli, and the categories model (data from Santoro et al. 2017, only).

* analyze_03_semantic_distances.m : Matlab; computes semantic between-stimulus distances using the natural language processing embeddings or the categories model.

* analyze_04a_dnns_vggish.py (Python): fits the VGGish model to the sound stimuli;

* analyze_04b_dnns_yamnet.py (Python): fits the Yamnet model to the sound stimuli;

* analyze_04c_dnns_kell.py (Python): fits Kell's network to the sound stimuli;

* analyze_04d_dnn_distances.m (Matlab): computes between-stimulus distances considering the representations in the deep neural network (DNN) models VGGish, Yamnet and Kell.

* analyze_05_behaviour_fmri.m (Matlab): analyzes model representations in behavioural data and in fMRI data.

* analyze_06_fmri_models_of_behaviour.m (Matlab): computes the DNN-based models of fMRI data. These models are used to predict behaviour with the code in analyze_05_behaviour_fmri.m

* The rest of the code inside this folder is called by the main analysis scripts described above.

* note about analyze_*.m scripts: change the variable "rootmain" at the beginning of each code section so that it points to the local path for the repository

* /data/: analysed data, including model representations. The name of the /data/ subdirectories follows this convention: dataset_datatype, where dataset is either giordano or formisano for data from Giordano et al., (2010), and Santoro et al. (2017), respectively.

* /data/dataset_acoustics/ (e.g., formisano_acoustics) includes several .mat (Matlab) files:

* dataset_acousticmodel.mat files (e.g., formisano_cochleagram.mat) includes the stimulus representations in a specific acoustic model (e.g., cochleagram), for a specific dataset (e.g., formisano).

* dataset_acousticmodel_dist_whichdistance.mat files contain between-stimulus distances for a specific dataset, according to a specific acoustic model, and based (whichdistance = cos for cosine; whichdistance = euc for Euclidean). E.g., formisano_cochleagram_dist_cos.mat includes the cosine distance between the stimuli in the formisano dataset, according to the cochleagram model. Each of the distance files includes four variables:

* Components: cell containing strings that idenfity the components of the model;

* D = distance matrix in vectorized format (rows = stimulus pairs; columns = models);

* Model = cell containing a string that identifies the model;

* ndims = vector specifying the number of model parameters for each component of the model.

* /data/dataset_dnns/ (e.g., giordano_dnns) contains four subdirectories:

* /data/dataset_dnns/kell/: stimulus representations in the Kell network (hdf5 files, one file per stimulus);

* /data/dataset_dnns/vggish/: stimulus representations in the VGGish network (hdf5 files, one file per stimulus);

* /data/dataset_dnns/vggishrandom/: stimulus representations in the untrained VGGish network initialized with random weights (one .mat files including a structure with the representation of each of the stimuli in each layer; stimuli in first dimension of each variable);

* /data/dataset_dnns/yamnet/: stimulus representations in the Yamnet network (hdf5 files, one file per stimulus).

* all .mat files inside /data/dataset_dnns/ (e.g., giordano_dnns) contain between-stimulus distances according to the different DNN models (naming conventions and contents as specified for the acoustic models, above).

* /data/dataset_semantics/ (e.g., formisano_semantics) contains data considered for the natural language processing embeddings and for the categories model (only data from Santoro et al., 2017).

* /data/dataset_semantics/dataset_labels.csv and dataset_labels.xlsx (e.g., formisano_labels.csv) include the strings describing the sound source for each of the sound stimuli;

* dataset_semanticmodel.mat files (e.g., formisano_glove.mat) and dataset_semanticmodel.csv files (e.g., formisano_glove.csv) contain the natural language processing of each of the stimuli according to a specific semantic model (e.g., glove model).

* dataset_semanticmodel_dist_whichdistance.mat files contain between-stimulus distances according to the different semantic models (see acoustic models for naming convention, and contents).

* /data/dataset_stimuli/ (e.g., giordano_stimuli) contain the sound stimuli.

* each of the subdirectories contains the wav files (one per sound stimulus) at the sampling rate specified by the directory name (e.g., wav_16kHz includes wav files at 16 kHz sampling rate).

* stimuli_list.csv/mat/xlsx (e.g., stimuli_list.mat) contain the filename information for each of the stimuli saved in csv, mat, or xlsx format.

* /data/formisano_fmri/ contains the fMRI-distance data.

* fmridist_nospmean.mat = mat (Matlab) file including:

* between-stimulus distances for the test and training sets (fmridist_test and fmridist_train, respectively), variables of size [n_pairs, n_participants, n_stimulus_folds, n_rois];

* numerical identifiers for the stimuli in each of the stimulus folds (idx_test and idx_train), variables of size [n_stimuli, n_stimulus_folds];

* name of each of the six regions of interest (rois) considered in the analyses (roi_names);

* name of the sound stimuli in each of the stimulus folds (stimuli_test and stimuli_train), cell of size [n_stimuli, n_stimulus_folds].

* /data/giordano_behaviour/ contains the behavioural data.

* behavdist.mat = mat (Matlab) file including:

* behavioural distances in the sound dissimilarity condition (behavdist), variable of size [n_pairs, n_participants, n_stimulus_groups];

* numerical identifiers for the stimuli in each of the two stimulus groups, variable of size [n_stimuli,n_stimulus_groups];

* name of the sound stimuli in each of the stimulus groups (stimuli), cell of size [n_stimuli, n_stimulus_groups].

* behavdist_sem.mat = mat (Matlab) file including:

* behavioural distances in the word dissimilarity condition (behavdist), variable of size [n_pairs, n_participants, n_stimulus_groups];

* numerical identifiers for the stimuli in each of the two stimulus groups, variable of size [n_stimuli,n_stimulus_groups];

* name of the sound stimuli in each of the stimulus groups (stimuli), cell of size [n_stimuli, n_stimulus_groups].

* /data/giordano_fmri_prediction/ contains the data considered to predict behavioural data from the DNN-mapped fMRI data.

* formisano_*.mat = mat (Matlab) files contain the betas of the GLM models used to predict fMRI data distances using DNN distances (speech/nospeech, at the end of the filename = models fitted considering all stimuli, including speech, or after removing the speech stimuli);

* giordano_fmri_whichroi*.mat = mat (Matlab) files contain the dnn estimate of the between-stimulus distances in the different fMRI rois (see acoustic models, for contents).

* /results/: analysis results, including permutations

* each of the xls files inside /results/ contains a statistics table output from analyze_05_behaviour_fmri.m, and read to compute the LaTeX code for the Supplementary Tables in the manuscript.

* /results/matfiles/ includes several .mat files (Matlab) containing the results of the statistical tests. Each of them includes the following variables:

* analysis_opt: struct defining the analysis options

* ndims: cell defining n parameters for each of the models considered in the analysis

* outfilename: string identifying the output file

* varpart_info: cell variable defining the information of variance-partitioning analyses

* out: struct containing the results of the analysis (see CV_GLM_fit.m, for further details)

* /Toolboxes/ (Matlab): various pieces of code written to aid acoustic analyses, and to analyze the data.

### Installation instructions

The following toolboxes need to be installed to run different portions of the code.

#### Acoustic models

- install cochleagram and MTF in /Toolboxes/, add to Matlab path

code:

http://nsl.isr.umd.edu/downloads.html

- install SAI in /toolboxes/, add to Matlab path

code:

https://code.soundsoftware.ac.uk/projects/aim

https://www.acousticscale.org/wiki/index.php/Category:Auditory_Image.html

- install Texture in /Toolboxes/, add to Matlab path

code:

https://mcdermottlab.mit.edu/Sound_Texture_Synthesis_Toolbox_v1.7.zip

- install MIR toolbox (roughness model) in /toolboxes/, add to Matlab path

code:

https://www.jyu.fi/hytk/fi/laitokset/mutku/en/research/materials/mirtoolbox

- install Yin model (pitch/periodicity) in /Toolboxes/, add to Matlab path

code:

http://audition.ens.fr/adc/

- time-varying loudness and spectral centroid have been computed using the

LoudnessToolbox by Genesis Acoustics. This toolbox should be installed

in /Toolboxes/ and added to the Matlab path.

code:

The toolbox was announced on the auditory list, but the link is not valid anymore (http://www.auditory.org/mhonarc/2010/msg00135.html). For a copy of this toolbox contact Bruno L. Giordano (bruno dot giordano at univ-amu dot fr).

#### DNN models

- install yamnet and vggish in /code/nlp_dnn_models/audioset/

code:

https://github.com/tensorflow/models

weights:

https://storage.googleapis.com/audioset/vggish_model.ckpt

https://storage.googleapis.com/audioset/vggish_pca_params.npz

https://storage.googleapis.com/audioset/yamnet.h5

- install kelletal2018 in /code/nlp_dnn_models/

https://github.com/mcdermottLab/kelletal2018

- install pychochleagram in /code/nlp_dnn_models/

https://github.com/mcdermottLab/pycochleagram

#### NLP models

- install universal-sentence-encoder_4 in /code/nlp_dnn_models

weights:

https://tfhub.dev/google/universal-sentence-encoder/4

- install GNewsW2V in /code/nlp_dnn_models/

weights:

https://www.kaggle.com/datasets/leadbest/googlenewsvectorsnegative300

- install Glove (6B, 300D) in /code/nlp_dnn_models/

weights:

https://nlp.stanford.edu/data/glove.6B.zip

#### Matlab tools

- install mtimesx, add to Matlab path

code:

https://www.mathworks.com/matlabcentral/fileexchange/25977-mtimesx-fast-matrix-multiply-with-multi-dimensional-support

- install distribution plot, add to Matlab path

code:

https://www.mathworks.com/matlabcentral/fileexchange/23661-violin-plots-for-plotting-multiple-distributions-distributionplot-m

Data from: Intermediate acoustic-to-semantic representations link behavioural and neural responses to natural sounds

Data files

Abstract

Data from: Intermediate acoustic-to-semantic representations link behavioural and neural responses to natural sounds

Data files

Abstract

Methods

Usage notes

Works referencing this dataset