Artificial neural networks (ANNs) are sensitive to perturbations and adversarial attacks. One hypothesized solution to adversarial robustness is to align manifolds in the embedded space of neural networks with biologically grounded manifolds. Recent state-of-the-art works that emphasize learning robust neural representations, rather than optimizing for a specific target task like classification, support the idea that researchers should investigate this hypothesis. While works have shown that fine-tuning ANNs to coincide with biological vision does increase robustness to both perturbations and adversarial attacks, these works have relied on proprietary datasets- the lack of publicly available biological benchmarks make it difficult to evaluate the efficacy of these claims. Here, we deliver a curated dataset consisting of biological representations of images taken from two commonly used computer vision datasets, ImageNet and COCO, that can be easily integrated into model training and evaluation. Specifically, we take a large functional magnetic resonance imaging (fMRI) dataset (BOLD5000), preprocess it into representational dissimilarity matrices (RDMs), and establish an infrastructure that anyone can use to train models with biologically grounded representations. Using this infrastructure, we investigate the representations of several popular neural networks and find that as networks have been optimized for tasks, their correspondence with biological fidelity has decreased. Additionally, we use a previously unexplored graph-based technique, Fiedler partitioning, to showcase the viability of the biological data, and the potential to extend these analyses by extending RDMs into Laplacian matrices. Overall, our findings demonstrate the potential of utilizing our new biological benchmark to effectively enhance the robustness of models.

This dataset is made available as part of the publication of the following journal article:

Pickard W, Sikes K, Jamil H, Chaffee N, Blanchard N, Kirby M and Peterson C (2023) Exploring fMRI RDMs: enhancing model robustness through neurobiological data. Front. Comput. Sci. 5:1275026. doi: 10.3389/fcomp.2023.127502

Description of the data and file structure

This dataset is derivative of the BOLD5000 Release 2.0. Additional post-processing steps were performed to make the data more accessible in machine learning (ML) research using representational similarity analysis (RSA).

As a general overview, the following additional post-processing steps were performed with the results made available here:

1. New cortical regions of interest (ROIs) were defined for each subject using vcAtlast and visfAtlas.

Freesurfer was used to create the new ROIs. Freesurfer derivatives for each subject can be found in:

freesurfer-ROIs.zip

2. Beta value vectors were extracted for each new ROI and the ROIs from the original BOLD5000 paper.

Beta values for each of the four subjects (CSI1, CSI2, CSI3, CSI4) and each of the three cortical atlases (visfAtlas, vcAtlas, BOLD5000) can be found in files named:

[SUBJECT]_[ATLAS]_CORR_RDMs.h5

3. Representational dissimilarity matrices (RDMs) were calculated for each subject and ROI by comparing image responses.

rsatoolbox was used to calculate RDMs using the correlation distance metric. In addition to the four subjects listed above, a fifth "MEAN" subject was created using a weighted average of the RDMs of the four human subjects. This allows the incorporation of CSI4, who did not complete all trials. Subject RDMs can be found in files named:

[SUBJECT]_[ATLAS]_CORR_RDMs.h5

4. RDMs were generated for four artificial neural networks (ANNs) using the BOLD5000 stimulus images.

Four ANNs (AlexNet, ResNet50, MobileNetv2, EfficientNet b0) were presented with the BOLD5000 stimulus images and RDMs were calculated from their hidden layer activations. ANN RDMs can be found in files named:

[ANN]_CORR_RDMs.h5

5. New supercategory labels were created for ImageNet stimulus images.

As detailed in the original paper, stimulus images taken from the ImageNet dataset were grouped into new supercategories using the synset hierarchy. A full listing of each supercategory per image name can be found in:

imagenet_supercategories.csv

Sharing/Access information

Data was derived from the following sources:

Chang, Nadine; Pyles, John; Prince, Jacob; Tarr, Michael; Aminoff, Elissa (2021). BOLD5000 Release 2.0. Carnegie Mellon University. Dataset. https://doi.org/10.1184/r1/14456124
Chang, Nadine; Pyles, John; Marcus, Austin; Gupta, Abhinav; Tarr, Michael; Aminoff, Elissa; et al. (2019). BOLD5000. Carnegie Mellon University. Dataset. https://doi.org/10.1184/r1/6459449.v5
Chang, Nadine; Pyles, John; Marcus, Austin; Gupta, Abhinav; Tarr, Michael; Aminoff, Elissa (2019). BOLD5000, a public fMRI dataset while viewing 5000 visual images. Carnegie Mellon University. Journal contribution. https://doi.org/10.1184/r1/8097584.v1
Rosenke, M., van Hoof, R., van den Hurk, J., Grill-Spector, K., Goebel, R., 2021. A Probabilistic Functional Atlas of Human Occipito-Temporal Visual Cortex. Cerebral Cortex 31, 603–619. https://doi.org/10.1093/cercor/bhaa246
Rosenke, M., Weiner, K.S., Barnett, M.A., Zilles, K., Amunts, K., Goebel, R., Grill-Spector, K., 2018. A cross-validated cytoarchitectonic atlas of the human ventral visual stream. NeuroImage, Segmenting the Brain 170, 257–270. https://doi.org/10.1016/j.neuroimage.2017.02.040

Code/Software

Freesurfer: https://surfer.nmr.mgh.harvard.edu/
rsatoolbox: https://rsatoolbox.readthedocs.io/

BOLD5000 Additional ROIs and RDMs for neural network research

Data files

Abstract

Description of the data and file structure

1. New cortical regions of interest (ROIs) were defined for each subject using vcAtlast and visfAtlas.

2. Beta value vectors were extracted for each new ROI and the ROIs from the original BOLD5000 paper.

3. Representational dissimilarity matrices (RDMs) were calculated for each subject and ROI by comparing image responses.

4. RDMs were generated for four artificial neural networks (ANNs) using the BOLD5000 stimulus images.

5. New supercategory labels were created for ImageNet stimulus images.

Sharing/Access information

Code/Software

BOLD5000 Additional ROIs and RDMs for neural network research

Data files

Abstract

README: BOLD5000 Additional ROIs and RDMs for Neural Network Research

Description of the data and file structure

1. New cortical regions of interest (ROIs) were defined for each subject using vcAtlast and visfAtlas.

2. Beta value vectors were extracted for each new ROI and the ROIs from the original BOLD5000 paper.

3. Representational dissimilarity matrices (RDMs) were calculated for each subject and ROI by comparing image responses.

4. RDMs were generated for four artificial neural networks (ANNs) using the BOLD5000 stimulus images.

5. New supercategory labels were created for ImageNet stimulus images.

Sharing/Access information

Code/Software

Works referencing this dataset