Eigenvector-based Speech Mask Estimation using a Logistic Regression for Multi-Channel Speech Enhancement

Lukas Pfeifenberger, Matthias Zöhrer, Franz Pernkopf

Publikation: Beitrag in Buch/Bericht/KonferenzbandBeitrag in einem KonferenzbandBegutachtung

Abstract

In this paper, we use a logistic regression to learn a speech mask from the dominant eigenvector of the Power Spectral Density (PSD) matrix of a multi-channel speech signal corrupted by ambient noise. We employ this speech mask to construct the Generalized Eigenvalue (GEV) beamformer and aWiener postfilter. Further, we extend the beamformer to compensate for speech distortions. We do not make any assumptions about the array geometry or the characteristics of the speech and noise sources. Those parameters are learned from training data. Our assumptions are that the speaker may move slowly in the nearfield of the array, and that the noise is in the far-field. We compare our speech enhancement system against recent contributions using the CHiME4 corpus. We show that our approach yields superior results, both in terms of perceptual speech quality and speech mask estimation error.
Originalspracheenglisch
TitelProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Herausgeber (Verlag)ISCA, International Speech Communication Association
Seiten2660 - 2664
BandVolume 2017-August
DOIs
PublikationsstatusVeröffentlicht - 2017
Veranstaltung18th Annual Conference of the International Speech Communication Association: INTERSPEECH 2017 - Stockholm, Schweden
Dauer: 20 Aug 201724 Aug 2017

Konferenz

Konferenz18th Annual Conference of the International Speech Communication Association
Land/GebietSchweden
OrtStockholm
Zeitraum20/08/1724/08/17

Fingerprint

Untersuchen Sie die Forschungsthemen von „Eigenvector-based Speech Mask Estimation using a Logistic Regression for Multi-Channel Speech Enhancement“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren