Data for: A high-performance speech neuroprosthesis

Willett, Francis 1 ; Kunz, Erin1; Fan, Chaofei1; Avansino, Donald1; Wilson, Guy1; Choi, Eun Young 1 ; Kamdar, Foram1; Glasser, Matthew2; Hochberg, Leigh3; Druckmann, Shaul 1 ; Shenoy, Krishna1; Henderson, Jaimie1

Published Jun 16, 2023; Updated Sep 01, 2023 on Dryad. https://doi.org/10.5061/dryad.x69p8czpq

Abstract

Brain-computer interfaces (BCIs) can restore communication to people who have lost the ability to move or speak. In this study, we demonstrated an intracortical BCI that decodes attempted speaking movements from neural activity in motor cortex and translates it to text in real-time, using a recurrent neural network decoding approach. With this BCI, our study participant, who can no longer speak intelligibly due to amyotrophic lateral sclerosis, achieved a 9.1% word error rate on a 50-word vocabulary and a 23.8% word error rate on a 125,000-word vocabulary.

This dataset contains all of the neural activity recorded during these experiments, consisting of 12,100 spoken sentences as well as instructed delay experiments designed to investigate the neural representation of orofacial movement and speech production.

The data have also been formatted for developing and evaluating machine learning decoding methods, and we intend to host a decoding competition. To this end, the data also contain files for reproducing our offline decoding results, including a language model and an example RNN decoder.

Code associated with the data can be found here: https://github.com/fwillett/speechBCI.

Brain-computer interfaces (BCIs) can restore communication to people who have lost the ability to move or speak. In this study, we demonstrated an intracortical BCI that decodes attempted speaking movements from neural activity in motor cortex and translates it to text in real-time, using a recurrent neural network decoding approach. With this BCI, our study participant, who can no longer speak intelligibly due amyotrophic lateral sclerosis, achieved a 9.1% word error rate on a 50 word vocabulary and a 23.8% word error rate on a 125,000 word vocabulary. Neural activity was recorded with microelectrode arrays, and neural features are provided in the form of binned threshold crossings and spike band power (20 ms bins).

The data have also been formatted for developing and evaluating machine learning decoding methods, and we intend to host a decoding competition based on this data.

Version 4 Release

The latest release (“version 4”) should be used and contains bug fixes to various aspects of the data. The differences between the latest release and the prior release are as follows:

Block 5 of the 06.28.2022 dataset had a timing error and the alignment of the neural activity to the task was incorrect. This block has been removed from the “sentences” files and “competitionData” files.
The “audioEnvelope” features in the “sentences”, “diagnostiBlocks”, and “tuningTasks” files did not approximate the audio envelope well for datasets that had a DC offset in the microphone data. This has now been fixed by removing the mean of the audio data before computing the envelope.
The “test” and “train” partitions of 5/19 and 5/26 were defined incorrectly for the “competitionData”, these have been redefined and now contain the intended trials.
The “competitionHoldOut” data for session t12.2022.06.21 had an unintentional overlap with some of the data in the “sentences” file, this has been corrected so that no “data leak” now exists for this session. The “competitionHoldOut” data for t12.2022.06.21 now correctly consists of two blocks of data not contained in the “sentences” file.
A new baseline RNN has been trained on the corrected “train” data, and new tfRecord files have been made for RNN training. These updated files are in the “derived” folder.

Description of the data and file structure

There are four .gz files that contain the data: competitionData, sentences, diagnosticBlocks and tuningTasks.

“tuningTasks” contains data from the instructed delay tasks that assessed tuning to attempted orofacial movements, attempted speaking of single words, or isolated phoneme production attempts (these data are featured in Figure 1 of the paper).

“sentences” contains data from the instructed delay sentence production task (including training data and real-time BCI evaluation data). It was the core data used to train and evaluate the BCI, and contains the BCI output as well as the neural activity.

“competitionData” is a simplified version of the sentences data that has been formatted and partitioned for machine learning research. It contains a “train” and “test” partition for model development, and a held-out “competitionHoldOut” set intended for a speech decoding competition (labels will be released after the competition is over).

“diagnosticBlocks” contains data from a diagnostic task assessing tuning to a fixed set of 7 words across many days. This could be used to assess how neural tuning changes over time (and was used to make Figure 4d).

All data consists of .mat files that are intended to be loaded with MATLAB or Python (scipy.io.loadmat). The individual readme files for each dataset describes the variables and experiments in detail.

Additionally, we offer two files to help reproduce our RNN decoding results:

“languageModel” contains the files needed to run the 3-gram language model we used in the paper.

“derived” contains files needed to train an RNN in TensorFlow and an example baseline RNN.

Sharing/Access information

The data currently reside only on Dryad, and data were not derived from any other sources.

Data for: A high-performance speech neuroprosthesis

Data files

Abstract

README: Data from: A high-performance speech neuroprosthesis

Version 4 Release

Description of the data and file structure

Sharing/Access information

Usage notes

Works referencing this dataset