Data from: Neural categorization of visual words of alphabetic and non-alphabetic languages
Data files
May 15, 2026 version files 934.15 MB
-
Experiment_1.zip
43.95 MB
-
Experiment_2.zip
462.54 KB
-
Experiment_3.zip
11.24 MB
-
Experiment_4.zip
43.96 MB
-
Experiment_5.zip
44.02 MB
-
Experiment_6.zip
43.58 MB
-
Experiment_7a_8a.zip
559.99 MB
-
Experiment_7b_8b.zip
10.41 KB
-
Experiment_9.zip
186.91 MB
-
README.md
15.67 KB
Abstract
Languages provide social-category markers that tag people as one or another social group. How does the brain sort words into different language categories as a basis of the social-categorization function of language? We addressed this issue by testing neural categorization of visual words of different writing systems in nine studies using electroencephalography, magnetoencephalography, and a repetition suppression paradigm. We showed that a neural network, including the anterior temporal, insular, orbital frontal, and ventral occipito-temporal cortices in both hemispheres, was engaged in computations of correlation distances between two words to represent intra-language similarity and inter-language difference during categorization of visual words of alphabetic and non-alphabetic languages. These processes occurred as early as 150 ms post-stimulus, recruited within-hemisphere functional connections, operated independently of words’ semantic meanings and pronunciations, and exhibited consistently across individuals with diverse language backgrounds. These findings highlight the neural mechanisms of language-based spontaneous neural categorization of visual words as a basis of the social-categorization function of language.
Dataset DOI: 10.5061/dryad.34tmpg4wn
Description of the data and file structure
This dataset contains the data supporting the results in the paper entitled “Neural categorization of visual words of alphabetic and non-alphabetic languages”, illustrating the neural dynamics of language-based word categorization which is fundamental to the social-categorization function of language.
Files and variables
File: Experiment_1.zip
Description: Behavioral and EEG results in Experiment 1.
1_RS_behavior_Chinese_English.mat
This MATLAB file contains three tables of behavioral accuracy, reaction time and inverse efficiency score (IES) of each condition in the Chinese-English session. The following text describes the information contained in the five columns of each table.
- SubID: Subject ID
- Rep_CH_CHEN: the repetition condition of the Chinese words.
- Rep_EN_CHEN: the repetition condition of the English words.
- Alt_CH_CHEN: the alteration condition of the Chinese words.
- Alt_EN_CHEN: the alteration condition of the English words.
2_RS_behavior_German_English.mat
This MATLAB file contains three tables of behavioral accuracy, reaction time, and IES of each condition in the German-English session. The following text describes the information contained in the five columns of each table.
- SubID: Subject ID
- Rep_GE_GEEN: the repetition condition of the German words.
- Rep_EN_GEEN: the repetition condition of the English words.
- Alt_GE_GEEN: the alteration condition of the German words.
- Alt_EN_GEEN: the alteration condition of the English words.
3_RS_Chinese_Engish_ERP.mat
This MATLAB file contains a matrix of the averaged ERP signals of all subjects in the Chinese-English session as condition (1: Rep_CH; 2: Rep_EN; 3: Alt_CH; 4: Alt_EN) × electrode × time point (-200–800 ms).
4_RS_German_Engish_ERP.mat
This MATLAB file contains a matrix of the averaged ERP signals of all subjects in the German-English session as condition (1: Rep_GE; 2: Rep_EN; 3: Alt_GE; 4: Alt_EN) × electrode × time point (-200–800 ms).
5_EEG_Label.mat
This MATLAB file contains the whole brain electrodes corresponding to the dimensions of the electrode in the ERP files.
6_RS_Chinese_English_RDM.mat
This MATLAB file contains the averaged representation dissimilarity matrix of pairwise correlation distances in the Chinese-English session as time point (-200–800 ms) × word stimuli × word stimuli.
7_RS_Chinese_English_correlation_distance.mat
This MATLAB file contains the averaged time courses (-200–800 ms) of Fisher-Z transformed correlation distances corresponding to intra-language similarity and inter-language difference of each condition in the Chinese-English session.
8_RS_compute_Chinese_English_RDM.m
This MATLAB script demonstrates how to compute the 160×160 RDM at each time point for each subject. The computations of subsequent EEG/MEG RDMs are similar.
File: Experiment_2.zip
Description: Behavioral and EEG results in Experiment 2.
1_RS_behavior_scrambled_Chinese_English.mat
This MATLAB file contains three tables of behavioral accuracy, reaction time and IES of each condition in the scrambled Chinese-English session. The following text describes the information contained in the five columns of each table.
- SubID: Subject ID
- Rep_CH_scrambled_CHEN: the repetition condition of the scrambled Chinese words.
- Rep_EN_ scrambled_CHEN: the repetition condition of the scrambled English words.
- Alt_CH_ scrambled_CHEN: the alteration condition of the scrambled Chinese words.
- Alt_EN_ scrambled_CHEN: the alteration condition of the scrambled English words.
2_RS_scrambled_Chinese_English_ERP.mat
This MATLAB file contains a matrix of the averaged ERP signals of all subjects in the scrambled Chinese-English session as condition × electrode × time point.
3_EEG_Label.mat
This MATLAB file contains the whole brain electrodes corresponding to the dimension of the electrode in the ERP files.
File: Experiment_3.zip
Description: Behavioral and EEG results in Experiment 3.
1_RS_behavior_radicals_letters.mat
This MATLAB file contains three tables of behavioral accuracy, reaction time, and IES of each condition in the radicals-letters session. The following text describes the information contained in the five columns of each table.
- SubID: Subject ID
- Rep_RD_RDLT: the repetition condition of the Chinese radicals.
- Rep_LT_RDLT: the repetition condition of the English letters.
- Alt_RD_RDLT: the alteration condition of the Chinese radicals.
- Alt_LT_RDLT: the alteration condition of the English letters.
2_RS_radicals_letters_ERP.mat
This MATLAB file contains a matrix of the averaged ERP signals of all subjects in the radicals-letters session as condition (1: Rep_RD; 2: Rep_LT; 3: Alt_RD; 4: Alt_LT) × electrode × time point (-200–800 ms).
3_EEG_Label.mat
This MATLAB file contains the whole brain electrodes corresponding to the dimension of the electrode in the ERP files.
4_RS_radicals_letters_RDM.mat
This MATLAB file contains the averaged representation dissimilarity matrix of pairwise correlation distances in the radicals-letters session as time point (-200–800 ms) × word stimuli × word stimuli.
5_RS_radicals_letters_correlation_distance.mat
This MATLAB file contains the averaged time courses (-200–800 ms) of Fisher-Z transformed correlation distances corresponding to intra-unit similarity and inter-unit difference of each condition in the radicals-letters session.
File: Experiment_4.zip
Description: Behavioral and EEG results in Experiment 4.
1_RS_behavior_Chinese_English.mat
2_RS_behavior_German_English.mat
3_RS_Chinese_Engish_ERP.mat
4_RS_German_Engish_ERP.mat
5_EEG_Label.mat
6_RS_Chinese_English_RDM.mat
7_RS_Chinese_English_correlation_distance.mat
Details of the MATLAB files are the same with those of the Experiment_1.zip
File: Experiment_5.zip
Description: Behavioral and EEG results in Experiment 5.
1_RS_behavior_Chinese_English.mat
2_RS_behavior_German_English.mat
3_RS_Chinese_Engish_ERP.mat
4_RS_German_Engish_ERP.mat
5_EEG_Label.mat
6_RS_Chinese_English_RDM.mat
7_RS_Chinese_English_correlation_distance.mat
Details of the MATLAB files are the same with those of the Experiment_1.zip
File: Experiment_6.zip
Description: Behavioral and EEG results in Experiment 6.
1_RS_behavior_Korean_Italian.mat
This MATLAB file contains three tables of behavioral accuracy, reaction time and IES of each condition in the Korean-Italian session. The following text describes the information contained in the five columns of each table.
- SubID: Subject ID
- Rep_KR_KRIT: the repetition condition of the Korean words.
- Rep_IT_KRIT: the repetition condition of the Italian words.
- Alt_KR_KRIT: the alteration condition of the Korean words.
- Alt_IT_KRIT: the alteration condition of the Italian words.
2_RS_Korean_Italian_ERP.mat
This MATLAB file contains a matrix of the averaged ERP signals of all subjects in the Korean-Italian session as condition (1: Rep_KR; 2: Rep_IT; 3: Alt_KR; 4: Alt_IT) × electrode × time point (-200–800 ms).
3_EEG_Label.mat
This MATLAB file contains the whole brain electrodes corresponding to the dimension of the electrode in the ERP files.
4_RS_Korean_Italian_RDM.mat
This MATLAB file contains the averaged representation dissimilarity matrix of pairwise correlation distances in the Korean-Italian session as time point (-200–800 ms) × word stimuli × word stimuli.
5_RS_Korean_Italian_correlation_distance.mat
This MATLAB file contains the averaged time courses (-200–800 ms) of Fisher-Z transformed correlation distances corresponding to intra-language similarity and inter-language difference of each condition in the Korean-Italian session.
File: Experiment_7a_8a.zip
Description: Behavioral and MEG results in Experiment 7a and 8a.
1_RS_behavior_Chinese_English_7a.mat
This MATLAB file contains three tables of behavioral accuracy, reaction time and IES of each condition in the Chinese-English session in experiment 7a. The following text describes the information contained in the five columns of each table.
- SubID: Subject ID
- Rep_CH_CHEN: the repetition condition of the Chinese words.
- Rep_EN_CHEN: the repetition condition of the English words.
- Alt_CH_CHEN: the alteration condition of the Chinese words.
- Alt_EN_CHEN: the alteration condition of the English words.
2_RS_behavior_Chinese_English_8a.mat
Details of the MATLAB files are the same with those of the Experiment_7a
3_RS_Chinese_Engish_ERF_ 7a_8a.mat
This MATLAB file contains three matrix of the averaged ERF signals of significant clusters of all subjects in the Chinese-English session of experiment 7a and 8a as condition (1: Alt-Cond; 2: Rep-Cond) × time point (-200–800 ms).
4_MEG_Label.mat
This MATLAB file contains the MEG sensors corresponding to the significant clusters in the ERF files.
5_RS_Chinese_English_ROI_results_7a_8a.mat
This MATLAB file contains two matrices of the averaged MEG source signals of eleve regions of interest (ROI) of all subjects in the Rep-Cond and Alt-Cond in the Chinese-English session of Experiment 7a and 8a as ROI × time point as well as the ROI labels.
6_RS_Chinese_English_RDM_7a_8a.mat
This MATLAB file contains the averaged representation dissimilarity matrix of pairwise correlation distances of all subjects in the Chinese-English session of Experiment 7a and 8a as time point (-200–800 ms) × word stimuli × word stimuli.
7_RS_Chinese_English_RDM_7a.mat
This MATLAB file contains the averaged representation dissimilarity matrix of pairwise correlation distances of all subjects in the Chinese-English session of Experiment 7a as time point (-200–800 ms) × word stimuli × word stimuli.
8_RS_Chinese_English_RDM_8a.mat
This MATLAB file contains the averaged representation dissimilarity matrix of pairwise correlation distances of all subjects in the Chinese-English session of Experiment 8a as time point (-200–800 ms) × word stimuli × word stimuli.
9_RS_Chinese_English_correlation_distance_7a_8a.mat
This MATLAB file contains the averaged time courses (-200–800 ms) of Fisher-Z transformed correlation distances of all subjects corresponding to intra-language similarity and inter-language difference of each condition in the Chinese-English session of Experiment 7a and 8a.
10_RS_Chinese_English_correlation_distance_7a.mat
This MATLAB file contains the averaged time courses (-200–800 ms) of Fisher-Z transformed correlation distances of all subjects corresponding to intra-language similarity and inter-language difference of each condition in the Chinese-English session of Experiment 7a.
11_RS_Chinese_English_correlation_distance_8a.mat
This MATLAB file contains the averaged time courses (-200–800 ms) of Fisher-Z transformed correlation distances of all subjects corresponding to intra-language similarity and inter-language difference of each condition in the Chinese-English session of Experiment 8a.
12_RS_Chinese_English_GCA_results_7a_8a.mat
This MATLAB file contains the averaged Granger causality values of all subjects corresponding to intra-language similarity and inter-language difference in the Chinese-English session of Experiment 7a and 8a.
13_Time_courses_frontal_language_network_7a_8a.mat
This MATLAB file contains two matrices of the averaged ERF signals of the frontal language network of experiment 7a and 8a as ROI × time point (-200–800 ms).
14_GCA_intra_Chinese_similarity.m
This MATLAB script demonstrates how to compute the pairwise conditional Granger causality values of intra-Chinese similarity among the 11 ROIs for each subject. The analyses of intra-English similarity and inter-language difference are similar.
File: Experiment_7b_8b.zip
Description: Behavioral and MEG results in Experiment 7b and 8b.
1_association_implicit_explicit_categorization.mat
This MATLAB file contains the averaged intra-language similarity, inter-language difference and δ at time points with significant RS effects in the Alt-Cond of Chinese-English session of Experiment 7a and 8a and in the explicit categorization task of Experiment 7b and 8b.
2_explicit_categorization_association_delta_behavior.mat
This MATLAB file contains the averaged δ computed by the MEG source signals of Experiment 7b and 8b at time points with significant RS effects in Experiment 7a and 8a and the participants’ behavioral performances including accuracy, reaction time and IES.
3_association_δ_IES_individual_region.mat
This MATLAB file contains beta values and FDR-corrected P values from regression analyses investigating the association between δ and IES. These analyses were restricted to MEG source signals localized to individual brain regions.
4_association_δ_IES_disruption_analysis.mat
This MATLAB file contains the beta values and their FDR-corrected P values from regression analyses investigating the association between δ and IES. The analyses used MEG source signals from the language-based word categorization network, with each brain region sequentially excluded to assess its functional contribution.
File: Experiment_9.zip
Description: Behavioral and MEG results in Experiment 9.
1_RS_behavior_Korean_Italian.mat
This MATLAB file contains three tables of behavioral accuracy, reaction time and IES of each condition in the Korean-Italian session. The following text describes the information contained in the five columns of each table.
- SubID: Subject ID
- Rep_KR_KRIT: the repetition condition of the Korean words.
- Rep_IT_KRIT: the repetition condition of the Italian words.
- Alt_KR_KRIT: the alteration condition of the Korean words.
- Alt_IT_KRIT: the alteration condition of the Italian words.
2_RS_Korean_Italian_ERF.mat
This MATLAB file contains three matrix of the averaged ERF signals of significant clusters of all subjects in the Korean-Italian session as condition (1: Alt-Cond; 2: Rep-Cond) × time point (-200–800 ms).
3_MEG_Label.mat
This MATLAB file contains the MEG sensors corresponding to the significant clusters in the ERF files.
4_RS_Korean_Italian_ROI_results.mat
This MATLAB file contains two matrices of the averaged MEG source signals of eleve regions of interest (ROI) in the Rep-Cond and Alt-Cond in the Korean-Italian session as ROI × time point as well as the ROI labels.
5_RS_Korean_Italian_RDM.mat
This MATLAB file contains the averaged representation dissimilarity matrix of pairwise correlation distances in the Korean-Italian session as time point (-200–800 ms) × word stimuli × word stimuli.
6_RS_Korean_Italian_correlation_distance.mat
This MATLAB file contains the averaged time courses (-200–800 ms) of Fisher-Z transformed correlation distances corresponding to intra-language similarity and inter-language difference of each condition in the Korean-Italian session.
7_RS_Korean_Italian_GCA_results.mat
This MATLAB file contains the averaged Granger causality values corresponding to intra-language similarity and inter-language difference in the Korean-Italian session.
Code/software
MATLAB 2019b
Human subjects data
We received explicit consent from our participants to publish the de-identified data in the public domain. The data we published were only include behavioral measures and EEG/MEG signals without personally identifiable information.
