Last Updated on May 26, 2024 by SPN Editor
A team from the University of Washington has created an artificial intelligence system, named “Target Speech Hearing” (TSH), that allows a headphone user to “enroll” a speaker by looking at them for a few seconds. The Target Speech Hearing system then silences all other environmental sounds and amplifies only the voice of the enrolled speaker. This happens in real time, even if the listener is in a noisy place or not facing the speaker.
Earlier, Noise-canceling headphones have become increasingly effective at creating a sound vacuum. However, the challenge lies in allowing specific sounds from the user’s surroundings to penetrate this silence. For example, Apple’s latest AirPods Pro can adjust sound levels based on the user’s activity, such as conversation, but the user has limited control over this feature.
The team unveiled their findings at the ACM CHI Conference on Human Factors in Computing Systems in Honolulu on May 14. The code for this prototype device is open-source, allowing others to contribute to its development. However, the system is not yet available for commercial use.
Shyam Gollakota, a senior author and professor at the Paul G. Allen School of Computer Science & Engineering, explained that this project uses AI to alter the auditory perception of headphone users based on their preferences. This means users can clearly hear a single speaker even in a noisy environment filled with multiple conversations.
To use the Target Speech Hearing (TSH) system, a user wearing standard headphones equipped with microphones simply presses a button while looking at the person speaking. The sound waves from the speaker’s voice should reach both sides of the headset at the same time, within a 16-degree margin of error.
The headphones then transmit this signal to an embedded computer, where the team’s machine learning software learns the speaker’s vocal patterns. The system then focuses on the speaker’s voice and continues to amplify it for the listener, even as they move around.
In tests conducted on 21 subjects, the clarity of the enrolled speaker’s voice was rated almost twice as high as the unfiltered audio. The Target Speech Hearing system currently only supports one speaker at a time and requires a clear voice from the target speaker for enrollment. If the sound quality is unsatisfactory, the user can re-enroll the speaker to improve clarity.
The team is now working on adapting the system for use with earbuds and hearing aids.
More About AI system called Target Speech Hearing (TSH):
What is Target Speech Hearing?: TSH is an Artificial Intelligence (AI) system integrated into headphones. Its primary function is to help users focus on a single speaker in a noisy environment.
How does it work?: The system operates through a process known as “enrollment”. Users enroll speakers by looking at them for a duration of three to five seconds. This action triggers real-time voice isolation, which means the system can separate the speaker’s voice from other sounds in the environment.
What happens after enrollment?: Once a speaker is enrolled, the Target Speech Hearing system cancels out all other ambient sounds. It then plays back only the voice of the targeted speaker. This functionality remains effective even if the listener moves around and no longer faces the speaker.
What are the benefits?: The technology significantly enhances the clarity of the enrolled speaker’s voice over background noise. This feature can be particularly useful in noisy environments where focusing on a single speaker can be challenging.
Who can benefit from TSH?: The Target Speech Hearing system has potential applications in assisting people with partial hearing loss. It can also make conversations in noisy areas less chaotic, thereby benefiting anyone struggling to focus on a single speaker amidst background noise.