Skip to content Skip to footer

Researchers from NYU have developed a revolutionary AI system for speech synthesis.

Researchers from New York University have made significant advances in neural speech decoding, potentially helping individuals who have lost their ability to speak to regain their voice. The study, published in ‘Nature Machine Intelligence’, presents a deep learning framework that accurately translates brain signals into intelligible speech. This could potentially be used by individuals who have suffered brain injuries or related physical trauma to regain their voice through the use of voice synthesizers.

Key to the technique is electrocorticography (ECoG), a type of neuroimaging that records electrical activity from the cortical surface of the brain. The researchers developed a deep learning model that converts ECoG signals to a set of interpretable speech features, such as pitch and loudness, thus capturing the vital elements of speech production. This generated a compact representation of the intended speech, which is then converted into a speech waveform and synthesized into natural-sounding speech.

To accomplish this, the team trained an AI model that could power a speech synthesis device, designed to enable those with speech loss to communicate using only their thoughts. The process starts with the acquisition of brain data from participants reading sentences aloud while their brain activity is recorded using ECoG grids. These grids capture electrical signals from the areas of the brain involved in speech production.

Raw brain signals are then mapped to specific speech features using algorithms. These features are converted back into audible speech using a speech synthesizer, which generates a spectrogram – a visual representation of sound frequencies over time. The generated speech is evaluated against the original speech for similarity, with impressive results recorded. The model’s robustness was further tested by accurately decoding previously unseen words.

The method developed by the researchers has several advantages. It does not require ultra-high-density electrode arrays, making it more practical for long-term use. Additionally, it can successfully decode speech from both hemispheres of the brain, which is especially important for patients with brain damage restricted to one side.

This study builds on previous research in neural speech decoding and brain-computer interfaces (BCIs). Earlier work in 2023 at the University of California allowed a paralyzed stroke survivor to generate sentences using a BCI that synthesized vocal and facial expressions from brain signals. Other studies include the use of AI to interpret brain activity, demonstrating the ability to generate images, text, and even music.

Despite exciting advancements, some challenges remain. The collection of high-quality brain data requires extensive machine learning training, and individual differences in brain activity can complicate generalization. However, the system developed by the NYU team is more accessible and lightweight, using widely available and clinically viable electrodes.

The NYU team aims to refine their models for real-time speech decoding, a crucial step towards enabling fluent conversations for speech-impaired individuals. They also intend to adapt the system to include completely implantable wireless devices for everyday use, signalling an important advancement in speech synthesis technology that can improve the lives of many with speech impairments.

Leave a comment

0.0/5