Skip to main content

Evoking the N400 Event-Related Potential (ERP) component using a publicly available novel set of sentences with semantically incongruent or congruent eggplants (endings)

Cite this dataset

Toffolo, Kathryn; Freedman, Edward; Foxe, John (2024). Evoking the N400 Event-Related Potential (ERP) component using a publicly available novel set of sentences with semantically incongruent or congruent eggplants (endings) [Dataset]. Dryad.


During speech comprehension, the ongoing context of a sentence is used to predict sentence outcome by limiting subsequent word likelihood. Neurophysiologically, violations of context-dependent predictions result in amplitude modulations of the N400 event-related potential (ERP) component. While N400 is widely used to measure semantic processing and integration, no publicly-available auditory stimulus set is available to standardize approaches across the field. Here, we developed an auditory stimulus set of 442 sentences that utilized the semantic anomaly paradigm, provided cloze probability for all stimuli, and was developed for both children and adults. With 20 neurotypical adults, we validated that this set elicits robust N400’s, as well as two additional semantically-related ERP components: the recognition potential (~250 ms) and the late positivity component (~600 ms). This stimulus set ( and the 20 high-density (128-channel) electrophysiological datasets ( are made publicly available to promote data sharing and reuse. Future studies that use this stimulus set to investigate sentential semantic comprehension in both control and clinical populations may benefit from the increased comparability and reproducibility within this field of research.


2.1 Participants

Twenty-four neurotypical adults were recruited and provided written informed consent to participate in this study. Four subjects were excluded from data analysis due to failure to remain alert or to sit still during data collection (n=2), or due to noisy EEG data (n=2) (Figure S1). The remaining participants make up the fully-analyzed dataset. These twenty subjects ranged in age from 18 to 35 (mean age = 25.5 +/- 4.36), nine were female, and three were left-handed. Every participant spoke English as their first language, and twelve participants were mono-lingual while eight participants reported being bi- or multi- lingual. Demographic information for all participants, including those removed from further analysis are reported in Table 1. 

2.2 Stimuli

The semantic anomaly paradigm consisted of 221 sentence pairs with incongruent and congruent endings, for a total of 442 stimuli in this stimulus set. However, twenty sentence pairs were eliminated before analysis because their endings did not match in syllable number, contained hyphenated phrases, cultural references, or upon closer examination, the supposed incongruent endings were in fact congruent. The remaining 402 stimuli (201 congruent and 201 incongruent sentence pairs) ranged from four to eight words in length. Sentences were created using simple words, derived from a set of age-appropriate words contained in the Medical Research Council Psycholinguistic (MRCP) database ( For example, from the word “cake”, the congruent sentence “He baked a birthday cake” was created. The words selected from the MRCP database took into consideration the age of acquisition (from a database provided by (Gilhooly and Logie, 1980)) and/or written word frequency (from a normed written word frequency set (Francis and Kucera, 1967)). This was to ensure that each sentence could be readily comprehended by NT individuals aged 5 years or older. The majority of these age-appropriate words were monosyllabic, but a few basic words with 2-syllables (e.g. present) and 3-syllables (e.g. animal) were included, due to their high-written-word frequency or early age of acquisition. Similar to previous studies using the MRCP database (Ross et al., 2011, 2007), we were sensitive to the fact that everyday language use has changed since the creation of these sets, so we carefully selected words that are still in common use.

This stimulus set included sentence pairs where the incongruent endings were just semantic errors. After elimination, this amounts to 132 / 201 stimulus pairs including example stimuli. These incongruent endings were matched to their congruent pair in word type (e.g. noun or verb) and number (e.g. plural or singular). There were also sentence pairs where incongruent endings contained both semantic and syntactic errors. After elimination, this amounts to 69 / 201 stimulus pairs. Semantically incongruent endings were also classified as syntactic errors if the ending deviated from the syntax provided by the sentential context. The type of syntactic errors included: 1. Endings where a plural noun expectation was deviated with a singular noun or vice versa (27 stimulus pairs); 2. Endings where an adjective expectation was deviated with a noun or vice versa (20 stimulus pairs); and 3. Endings where a verb expectation was deviated with a noun or vice versa (22 stimulus pairs). During final analysis, these 3 types of syntactic errors were combined into a single linguistic division (LD) in order to compare the overall response to sentences with both semantic and syntactic errors, and the response to sentences with only semantic errors.

Language comprehension in NT individuals is highly influenced by communicative cues such as prosody, especially when sentence meaning relies on syntactic prosodic cues (Thorson 2018, (Cutler et al., 1997; Dahan, 2015; Frazier et al., 2006, Thorson 2018). The intention of this manuscript was to create a stimulus set that could be used for all populations equally. Therefore, this stimulus set was constructed without prosody to ensure that NT individuals would not have an advantage in language comprehension over other populations known to have difficulty with communication or prosody, such as those with ASD (DePape et al., 2012; Eigsti et al., 2012; Martzoukou et al., 2017; McCann et al., 2007; O’Connor, 2012; Wang and Tsao, 2015) or schizophrenia (Leitman et al., 2007, 2005). To do so, individual words from the word list were recorded from a female speaker, instructed to voice the words with minimal inflection, stress, and intonation (i.e. in a monotonous non-prosodic manner). Words were then compiled into complete sentences using the Audacity Software (Version 3.0.0. Audacity® software is copyright © 1999 - 2021 Audacity Team. https://audacity ). These artificially-compiled sentences were manually adjusted to have similar pitch-frequency between each word within a sentence and between all sentences. Concurrently, time gaps were added between words so that all sentences would have similar pacing and that researchers would be able to trigger discretely on each word. Both artificial timing and frequency add to the robotic nature of the stimuli. A future initiative will add the prosodic versions of these sentences to this public stimulus set so that researchers can explore more communicative aspects to language.  

2.3 Procedure

Participants were fit with a 128-electrode cap (Bio Semi B.V. Amsterdam, the Netherlands) and seated in a sound attenuating, electrically shielded booth (Industrial Acoustics Company, The Bronx, NY) with a computer monitor (Acer Predator Z35 Curved HD, Acer Inc.) and a standard keyboard (Dell Inc.). The task was created with Presentation® Software (Version 18.0, Neurobehavioral Systems, Inc. Berkeley, CA). The task was first explained to the participant during the consent process and then again before the experimental session. Individuals were asked to refrain from excessive movement and to focus on a fixation cross throughout the task in order to reduce movement artifacts. The experimental session began by explaining the task for a third time. All instructions were presented both visually on the screen and auditorily through the headphones (Sennheiser electronic GmbH & Co. KG, USA). Instructions were followed by two practice trials which were the same for every participant. Feedback was given about a participant’s response only during practice trials and not during experimental trials. Trials were presented as follows: 1. A fixation cross was on the screen while an auditory sentence stimulus was presented through headphones; 2. A two second pause; and 3. A question (presented both visually and auditorily) asked the participant if the sentence ended as expected, where subjects responded with a right or left arrow key when sentences ended as expected (congruent) or unexpected (incongruent) respectively, to end the trial. A two-second delay was inserted between a subject’s response and the start of the next sentence. A total of 442 stimuli were presented to participants in the same order. This was done to ensure that every participant had the same experience throughout the task for every sentence. Two of these stimuli (one congruent and one incongruent) were used for example trials. The remaining 440 were used for the experiment. Stimuli were separated into 11 blocks with optional breaks between each block. Participants could continue onto the next block by pressing the spacebar. After elimination, the responses to 400 out of the 440 stimuli contributed to the analysis of this experiment.

2.4 Data preprocessing

Data were digitized online at a rate of 512Hz, DC to 150 Hz pass-band, and referenced to the common mode sense (CMS) active electrode. EEG data were preprocessed and analyzed offline using in-house scripts leveraging EEGLAB functions (Delorme & Makeig, 2004). Data were filtered using a Chebyshev II spectral filter with a band pass of 0.1 - 45 Hz. Channels were rejected automatically if recorded data from an electrode exceeded more than 3 standard deviations from the mean variance and amplitude from all electrodes, the channel would be rejected and interpolated using EEGLAB spherical interpolation. Data were then re-referenced to the common average. Prior to analysis, the time from the beginning of a sentence to the onset of the last word was measured for each stimulus using Praat (PRAAT v. 6.1, University of Amsterdam, the Netherlands). These measures were used to adjust the time stamp of each stimulus, so that the data could be aligned to the onset of the last word (i.e. 0 ms), rather than the beginning of the sentence. For all participants, epochs from -200 to 1000 ms were created using a baseline of the 200 ms interval before the onset of the last word. Trials were rejected automatically based on an artifact rejection threshold of 250 mV and if a trial contained amplitudes greater than two standard deviations from the mean amplitude across all channels. Grand average ERP waveforms were generated by first averaging the trials per condition per electrode, and then averaging by participant.

2.5. Statistical analysis

JASP (Jeffrey’s Amazing Statistic Program Team [2020], Version 0.12.2) was used for statistical analyses. Three midline electrodes (Fz, Cz, and Pz) were chosen a priori for investigation (Lau et al., 2008; Luck, 2005; Osterhout and Holcomb, 1992). Other electrodes (F7, T7, and P7) were investigated post hoc. For every participant, these selected electrodes were assessed for an effect of condition using a repeated measure ANOVA (rmANOVA) at four time points of interest (250 ms, 400 ms, 600 ms, and 700 ms). Amplitude values for these electrodes were acquired by averaging the amplitudes across a 10 ms time window, centered at the time point of interest. Additional rmANOVA’s were conducted to assess for a main effect of CP, order, linguistic division, and time. F-scores and p-values for a main effect of condition at the midline electrodes are shown in Table 2. Other main effects for midline electrodes are shown in Table 3. All main effects for electrodes F7, T7, and P7 are in Table 4.

Topography plot statistics were generated using the FieldTrip toolbox (Oostenveld et al., 2011) for MATLAB and displayed using the EEGLAB toolbox. A group level cluster-based permutation test was conducted using two-tailed, independent sample t-statistics with a critical alpha-level of 0.05. This test applied the Monte-Carlo method to estimate significance probability, the triangulation method of the neighbours function for spatial clustering, and a multiple-comparison correction. Single sample clusters were combined using “maxsum” and a 5% two-sided cutoff criterion was applied to both positive and negative clusters. Topography statistics are presented as the average significance over a 10 ms time window centered at the time point of interest (Figure S2).

2.6 Cloze probability

To further characterize the stimulus set, a RedCap survey was employed to test the CP of all sentence endings. Each sentence in the set was presented with the final word missing (blank) and participants were required to fill in this blank with the first singular word that came to mind. If participants could not think of an answer, they were encouraged to guess rather than leave a blank. Non-answers were not counted towards CP scores; participants were removed from the survey data if they answered fewer than 10 questions out of the 221; and participants were removed if their percent correct was three standard deviations from the mean. After elimination, the survey used the responses from 134 individuals to assess the CP of each sentence. The majority of these stimuli had greater than 80% CP. The CP distribution of sentences were shown in Figure S3.

2.7 Data Availability

This stimulus set ( and the supporting datasets ( are available through Dryad for the scientific community to use freely in their experiments. The stimulus set provides the auditory files for all 442 stimuli and a stimulus parameter file that includes stimulus information such as duration, target word onset, derivative divisions (i.e. CP, order, linguistic error, and time), and most importantly, the written form of each stimulus so that semantic comprehension via reading can be investigated. Cloze probability survey answer and result files are also within the stimulus set download.

The dataset download provides the 24 datasets in BIDS format via guidelines provided by (Pernet et al., 2019) and all the aforementioned stimulus set files. Participant information is also detailed in the dataset (.tsv and .json). The full dataset includes unfiltered EEG data (.bdf), corresponding event files, and channel rejection files for each participant (.tsv), as well as recording information, electrode positioning, and event file information (.tsv and/or .json). We additionally provide preprocessed ERP derivatives for this study (.mat), the corresponding trial rejection information per derivative for each participant (.tsv), and filtering parameters (.json). Refer to the README.txt files in both the dataset and stimulus set in order to use them appropriately. Use of this dataset, stimulus set, or presenting examples from this stimulus set should include a citation to this paper.

2.8. Code Availability

The code generated for the analysis of these datasets as well as the Presentation® code is available through Zenodo via Dryad ( The provided code was utilized to create the preprocessed ERP derivatives as well as figure components.

Usage notes

In order to use this dataset appropriately, please refer to the README.txt file. This describes all the folders and files within the N400 BIDs dataset, which includes the stimulus files. If this dataset is used, please cite the paper and this dataset. If you are looking for just the stimulus set used in this paper, you can find it here:


The Del Monte Institute for Neuroscience at the University of Rochester School of Medicine and Dentistry

Frederick J. and Marion A. Schindler