Promptzone

Cover image for AI Headphones Enable Selective Listening in Crowds
Promptzone - Commumity
Promptzone - Commumity

Posted on

AI Headphones Enable Selective Listening in Crowds

Noise-canceling headphones have significantly improved in providing an auditory blank slate. However, allowing users to selectively hear specific sounds from their environment remains a challenge. The latest Apple AirPods Pro can adjust sound levels for users automatically, such as during conversations, but they offer limited control over which person to listen to or when this adjustment happens.

A team at the University of Washington has developed a groundbreaking AI system named "Target Speech Hearing" (TSH). This innovative technology enables headphone users to focus on a single speaker in a noisy environment by simply looking at them for a few seconds. Once enrolled, the system cancels all other ambient sounds and isolates the enrolled speaker's voice in real time, even if the listener moves around and no longer faces the speaker.

Presented at the ACM CHI Conference on Human Factors in Computing Systems, the TSH system is a proof-of-concept device, and its code is available for further development. This system isn't yet commercially available but shows promising potential for future applications.

According to Shyam Gollakota, a senior author and professor in the Paul G. Allen School of Computer Science & Engineering at UW, "We develop AI to modify the auditory perception of anyone wearing headphones, based on their preferences. With our devices, you can now hear a single speaker clearly, even in a noisy environment with many other people talking."

How It Works

To use TSH, the user wears off-the-shelf headphones equipped with microphones and taps a button while looking at the person speaking. The headphones' microphones capture the sound waves from the speaker’s voice, and the signal is sent to an on-board embedded computer. The machine learning software on this computer identifies the speaker’s vocal patterns and isolates their voice, which is then played back to the listener in real time. The system’s accuracy improves as the speaker continues talking, providing more training data.

Key Features and Future Developments

  • Selective Listening: Isolate a single speaker’s voice in a noisy environment.
  • Real-Time Adaptation: Continues to track and isolate the speaker’s voice even as the user moves.
  • User Control: Simple activation by looking at the speaker for a few seconds.

The current system can enroll only one speaker at a time and works best when no other loud voices are coming from the same direction. If the sound quality is unsatisfactory, users can re-enroll the speaker for better clarity.

The team plans to expand this technology to earbuds and hearing aids, making selective listening more accessible in various forms.

Research and Testing

The TSH system was tested on 21 subjects who reported significantly better clarity for the enrolled speaker’s voice compared to unfiltered audio. This work builds on the team's previous "semantic hearing" research, which allowed users to select specific sound classes to hear while canceling others.

Conclusion

AI-powered selective listening represents a significant advancement in personal audio technology. The Target Speech Hearing system offers a glimpse into a future where users have greater control over their auditory environment, making it easier to focus on desired sounds amidst noise.

For more details, visit the team’s website.

Top comments (0)