Data for: An instantaneous voice synthesis neuroprosthesis
Data files
May 14, 2025 version files 710.85 MB
-
README.md
4.06 KB
-
t15_control_experiment.zip
710.84 MB
Abstract
Brain computer interfaces (BCIs) have the potential to restore communication for people who have lost the ability to speak due to neurological disease or injury. BCIs have been used to translate the neural correlates of attempted speech into text. However, text communication fails to capture the nuances of human speech such as prosody and immediately hearing one’s own voice. Here, we demonstrate a “brain-to-voice” neuroprosthesis that instantaneously synthesizes voice with closed-loop audio feedback by decoding neural activity from 256 microelectrodes implanted into the ventral precentral gyrus of a man with amyotrophic lateral sclerosis and severe dysarthria. We overcame the challenge of lacking ground-truth speech for training the neural decoder and were able to accurately synthesize his voice. Along with phonemic content, we were also able to decode paralinguistic features from intracortical activity, enabling the participant to modulate his BCI-synthesized voice in real-time to change intonation and sing short melodies. These results demonstrate the feasibility of enabling people with paralysis to speak intelligibly and expressively through a BCI.
The neural data associated with this study is available here.
An instantaneous voice synthesis neuroprosthesis
Maitreyee Wairagkar, Nicholas S. Card, Tyler Singer-Clark, Xianda Hou, Carrina Iacobacci, Lee M. Miller, Leigh R. Hochberg, David M. Brandman#, Sergey D. Stavisky#
# Co-senior authors
preprint: https://doi.org/10.1101/2024.08.14.607690
Overview
This repository contains the neural data recorded during speech tasks described in Wairagkar et al., “An instantaneous voice synthesis neuroprosthesis” (see Related works) and associated metadata (e.g., task identifier, task event times, what the prompted text was, behavioral measurements).
The participant was instructed to attempt to speak the sentences cued on screen in front of him at his own pace. The data are segmented into individual trials of “go” period where the participant attempted to speak each sentence. Data are organized into blocks of multiple sentences that were run during each session. The data contains pre-processed (log-transformed, normalized, and smoothed) spike-band power and threshold crossing features, cued sentences text, and the brain-to-voice decoder that ran in closed-loop during the session to synthesize voice from neural activity in real-time. For some datasets, other non-invasive biosignals are provided. The related paper describes the signal processing pipeline used to generate these data in detail. Link to the code for brain-to-voice synthesis is available in the Related works (https://github.com/Neuroprosthetics-Lab/brain-to-voice-2025).
Version 1 release files:
Files contain intracortical neural signals and behavioral measurements of participant’s residual vocalizations simultaneously recorded using a stethoscopic microphone and an inertial measurement unit (IMU) sensor placed on his right and left mastoids respectively from a speech BCI control experiment with limited 50-word vocabulary. Brain-to-voice decoders used in this experiment to synthesize voice from neural data are also provided.
t15_control_experiment.zip
contains:
.mat
files containing neural data, and stethoscopic mic and IMU recordings of T15’s vocalizations from cued speech task for multiple blocks and the associated metadata (total 240 trials).h5
brain-to-voice models trained on past neural data used in this experiment to synthesize voice. These.h5
files contain TensorFlow model configuration and model weights that can be used out-of-the-box to synthesize voice from the neural data provided in the.mat
files. The models can be loaded in Python usingkeras.models.load_model(model_filename.h5)
-
readme.txt
file with detailed description of the data in the .mat files(File naming convention: Neural data files are named as
t15_dayXXX_blockXX.mat
and the pretrained brain-to-voice models are named asbrain2voice_t15_dayXXX_blockXX.h5
wheret15
refers to an anonymized participant ID,dayXXX
refers to the post-implant BCI session day andblockXX
refers to the block number for the neural data recording blocks that were ran during the session.)
More neural data will be made available in this repository in the future to avoid a potential overlap with embargoed data included in an upcoming speech BCI decoding competition. Please check back in a few months.
Human subjects data
This data recorded from a human participant has been anonymized and de-identified. They do not contain any personally identifiable information. The subject is referred to using a coded clinical trial identifier (also used in the associated publication). The data has been de-identified by extracting processed features from the neural signals (out of an abundance of caution, raw neural data are also not shared). Behavioral data do not contain identifiable information. The participant has consented to the publication of this de-identified data in the public domain.