Eigenvector-based Speech Mask Estimation using a Logistic Regression for Multi-Channel Speech Enhancement

Lukas Pfeifenberger, Matthias Zöhrer, Franz Pernkopf

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

Abstract

In this paper, we use a logistic regression to learn a speech mask from the dominant eigenvector of the Power Spectral Density (PSD) matrix of a multi-channel speech signal corrupted by ambient noise. We employ this speech mask to construct the Generalized Eigenvalue (GEV) beamformer and aWiener postfilter. Further, we extend the beamformer to compensate for speech distortions. We do not make any assumptions about the array geometry or the characteristics of the speech and noise sources. Those parameters are learned from training data. Our assumptions are that the speaker may move slowly in the nearfield of the array, and that the noise is in the far-field. We compare our speech enhancement system against recent contributions using the CHiME4 corpus. We show that our approach yields superior results, both in terms of perceptual speech quality and speech mask estimation error.
Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherISCA, International Speech Communication Association
Pages2660 - 2664
VolumeVolume 2017-August
DOIs
Publication statusPublished - 2017
Event18th Annual Conference of the International Speech Communication Association: INTERSPEECH 2017 - Stockholm, Sweden
Duration: 20 Aug 201724 Aug 2017

Conference

Conference18th Annual Conference of the International Speech Communication Association
Country/TerritorySweden
CityStockholm
Period20/08/1724/08/17

Fingerprint

Dive into the research topics of 'Eigenvector-based Speech Mask Estimation using a Logistic Regression for Multi-Channel Speech Enhancement'. Together they form a unique fingerprint.

Cite this