Data for: An instantaneous voice synthesis neuroprosthesis
Data files
May 14, 2025 version files 710.85 MB
-
README.md
4.06 KB
-
t15_control_experiment.zip
710.84 MB
May 08, 2026 version files 26.87 GB
-
README.md
4.64 KB
-
t15_control_experiment.zip
710.84 MB
-
t15_neural_data.zip
26.16 GB
Abstract
Brain computer interfaces (BCIs) have the potential to restore communication for people who have lost the ability to speak due to neurological disease or injury. BCIs have been used to translate the neural correlates of attempted speech into text. However, text communication fails to capture the nuances of human speech such as prosody and immediately hearing one’s own voice. Here, we demonstrate a “brain-to-voice” neuroprosthesis that instantaneously synthesizes voice with closed-loop audio feedback by decoding neural activity from 256 microelectrodes implanted into the ventral precentral gyrus of a man with amyotrophic lateral sclerosis and severe dysarthria. We overcame the challenge of lacking ground-truth speech for training the neural decoder and were able to accurately synthesize his voice. Along with phonemic content, we were also able to decode paralinguistic features from intracortical activity, enabling the participant to modulate his BCI-synthesized voice in real-time to change intonation and sing short melodies. These results demonstrate the feasibility of enabling people with paralysis to speak intelligibly and expressively through a BCI.
The neural data associated with this study is available here.
Maitreyee Wairagkar, Nicholas S. Card, Tyler Singer-Clark, Xianda Hou, Carrina Iacobacci, Lee M. Miller, Leigh R. Hochberg, David M. Brandman#, Sergey D. Stavisky#
Co-senior authors
Paper: https://doi.org/10.1038/s41586-025-09127-3
Overview
This repository contains the neural data recorded during speech tasks described in Wairagkar et al., “An instantaneous voice synthesis neuroprosthesis”, Nature 2025 (see Related works) and associated metadata (e.g., task identifier, task event times, what the prompted text was, behavioral measurements).
The participant was instructed to attempt to speak the sentences cued on screen in front of him at his own pace. The data are segmented into individual trials of “go” period where the participant attempted to speak each sentence. Data are organized into multiple experimental sessions. The data contains pre-processed (log-transformed, normalized, and smoothed) spike-band power and threshold crossing features, cued sentences text, and the brain-to-voice decoder that ran in closed-loop during the session to synthesize voice from neural activity in real-time. Additionally, speech timings and time-aligned target speech features are provided. For some datasets, other non-invasive biosignals are provided. The related paper describes the signal processing pipeline used to generate these data in detail. Link to the code for brain-to-voice synthesis is available in the Related works (https://github.com/Neuroprosthetics-Lab/brain-to-voice-2025).
Files:
All data files contain intracortical neural signals, speech timings and time-aligned target speech features recorded over multiple sessions. T15 control experiment files additionally contain behavioral measurements of participant’s residual vocalizations. Brain-to-voice decoders used in predetermined evaluation sessions to synthesize voice from neural data are also provided.
t15_control_experiment.zip contains the following files from the control experiment session:
.matfiles containing neural data, text cues, participant's residual speech simultaneously recorded using a stethoscopic microphone and an inertial measurement unit (IMU) sensor from cued speech task for multiple blocks, and the associated metadata.h5brain-to-voice models trained on past neural data used in this experiment to synthesize voice. These.h5files contain TensorFlow model configuration and model weights that can be used out-of-the-box to synthesize voice from the neural data provided in the.matfiles. The models can be loaded in Python usingkeras.models.load_model(model_filename.h5)readme.txtfile with detailed description of the data in the .mat files
t15_neural_data.zip contains the following files for each session folder:
neural_data.matfile containing neural data, text cues, and the associated metadataspeech_indices.matfile containing word and syllable-level speech onset and offset timingstarget_speech_feats.matfile containing speech features (LPCNet features) time-aligned with neural data used for training the brain-to-voice models for each session.h5brain-to-voice model trained on neural data from all past sessions used to synthesize voice in closed-loop in this session. This.h5file contains TensorFlow model configuration and model weights that can be used out-of-the-box to synthesize voice from the neural data provided in the.matfiles. The models can be loaded in Python usingkeras.models.load_model(model_filename.h5)readme.txtfile with detailed description of the data
(File naming convention: Neural data files are named as t15_dayXXX_neural_data.mat and the pretrained brain-to-voice models are named as brain2voice_t15_dayXXX.h5 where t15 refers to an anonymized participant ID and dayXXX refers to the post-implant BCI session day.)
Human subjects data
This data recorded from a human participant has been anonymized and de-identified. They do not contain any personally identifiable information. The subject is referred to using a coded clinical trial identifier (also used in the associated publication). The data has been de-identified by extracting processed features from the neural signals (out of an abundance of caution, raw neural data are also not shared). Behavioral data do not contain identifiable information. The participant has consented to the publication of this de-identified data in the public domain.
Changes after May 14, 2025: Released neural data from all T15 sessions for voice synthesis in addition to the control experiment data from the related paper Wairagkar et al. Nature 2025. No personal or identifiable information is included in any of the data included here.
