Researchers at the University of Washington have created an Artificial Intelligence (AI) system, Target Speech Hearing (TSH), that enhances noise-canceling headphones to single out and amplify one voice amidst a noisy crowd. The cutting-edge invention battles the concern of noise-canceling headphones that muff all sounds, including those the users may want to pay attention to.
“Noise-cancelation makes it challenging even for those without hearing loss issues to concentrate on specific human interaction, especially in noisy situations,” as articulated by Shyam Gollakota, the project lead and a professor at the University of Washington.
The TSH system ingeniously marries noise-canceling headphones and AI to concentrate on individual voices in bustling environments. The user starts the “enrollment” process by focusing on the intended speaker for a brief period. The headphones’ binaural microphones extract an audio sample of the speaker’s vocal profile amidst the interfering speakers and noises. This binaural signal is then fed into a neural network that distinguishes the target speaker’s voice using directional information.
These unique features embody an embedding vector that a distinct neural network uses to single out the speech of the target speaker from a multitude of voices. The user can then listen to the target speaker, irrespective of their head movement or the direction they’re facing, after the enrollment phase. The TSH system filters the ongoing audio input, employing the speaker embedding while amplifying the targeted speaker’s voice and silencing other voices and background noise.
Currently, the TSH system only successfully identifies a targeted speaker whose voice is the most robust in a specific direction but the researchers plan to enhance the system to handle complex situations with multiple, diverse audio sources.
Samuele Cornell of the Carnegie Mellon University’s Language Technologies Institute commends this invention for its practicality and potential. He says, “This is indeed a significant move. Quite refreshing.”
While the TSH system is currently in the conceptual stage, discussions are ongoing to embed the technology in top noise-canceling brands and make it accessible for hearing aids. Coupled with the progress in audio and speech analysis by GPT-4o, persons with visual and auditory impairments will have a better understanding and connect more with their surroundings.