Senior Design Projects

ECS193 A/B Winter & Spring 2023

Wearable, gaze-directed beamforming to improve speech comprehension

Email **********
Name Lee Miller
Affiliation Center for Mind & Brain

Project's details

Project title Wearable, gaze-directed beamforming to improve speech comprehension
Background Everyday auditory environments are cluttered, noisy, and distracting. This presents a complex perceptual and computational challenge known as the “cocktail party problem”: how to extract relevant acoustic information while filtering out the background. The cocktail party problem is a paradigmatic case of blind-source separation, a broad class of computational approaches with application in fields as varied as acoustics, machine vision, medical imaging, stock prediction, and seismic monitoring. In the case of sound perception, one of the most important mechanisms our brains use to accomplish this is selective attention, which typically correlates with eye gaze – we look where we’re listening.
Description Our lab is developing a system that boosts sounds from one direction while suppressing sounds from all others using a beamformer algorithm guided by steering signals from an eye-tracker. It furthermore will apply state-of-the-art speech isolation and automatic speech recognition (ASR) to present textual captions in the listener’s view using augmented reality (AR). Put simply, the system represents a brain-guided filter for acoustic information – wherever the listener looks, she should hear that sound best.

The proposed project offers numerous instructive and industry-relevant computer science and engineering challenges, from programming to signal processing to device interfacing and communications, primarily using C++ and Python. It is structured so that team members can make progress on their own challenges while working closely with other team members to integrate the components. It has a clearly defined, feasible, and impactful deliverable.
Deliverable A near-real-time (<50ms audio delay) gaze-directed beamformer with AR captioning that increases the target speech signal-to-noise ratio by 10dB and improves comprehension in background noise significantly.
Skill set desirable Programming in Python and/or C++ and/or Unity. A great attitude!
Phone number **********
Client time availability 30-60 min weekly or more
IP requirement Client wishes to keep IP of the project
Attachment N/A
Selected Yes
Team members Aman Ganapathy
Sarah Yuniar
Ashley Bilbrey
Karim Abou Najm
TA Rex Liu