DUET-ISR | Yudong He

Introduction

Humans are born with selective hearing ability. Algorithms that simulate this function can be of great help in many applications in noisy environments, such as hearing aids, voice assistants, hands-free communication, teleconferencing, and humanoid robots with auditory systems.

Some application scenarios that a speech separation algorithm may provide help in noisy (multiple people) environments.

DUET-ISR is such an algorithm that we proposed to realize speech separation. It has the following advantages:

Versatility: can separate any number of speakers using just two mictrophones.
Simplicity: a simple processing structure, first masking, then spatial filtering. Utilizing two basic spatial cues, time delay and amplitude attenuation.
Robustness: good separation performance under reverberant environments.
Analyzability: non-deep learning framework, a completely analytical algorithm.
Convinience: can be used directly, no need to train.

Demo

We show an example when we try to separate four speakers’ speeches in the reverberation of 130 ms. We only show one mixture (in total two) and one demixed signal (in total four).

source

mixture

demix

More demos and comparison with other algorithms can be found here.

Code

Matlab code is available here. This repo also contains other blind source separation algorithms.

Reference:

He, Yudong, He Wang, Qifeng Chen, and Richard HY So. “Harvesting Partially-Disjoint Time-frequency Information for Improving Degenerate Unmixing Estimation Technique.” In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 506-510. IEEE, 2022.

Introduction

Demo

Code

References