Eigenvector-based Speech Mask Estimation for Multi- Channel Speech Enhancement

Lukas Pfeifenberger; Matthias Zöhrer; Franz Pernkopf

doi:10.1109/TASLP.2019.2941592

Eigenvector-based Speech Mask Estimation for Multi- Channel Speech Enhancement

Lukas Pfeifenberger, Matthias Zöhrer, Franz Pernkopf

Research output: Contribution to journal › Article › peer-review

Abstract

We present the Eigennet architecture for estimating a gain mask from noisy, multi-channel microphone observations. While existing mask estimators use magnitude features, our system also exploits the spatial information embedded in the phase of the data. The mask is used to obtain the Minimum Variance Distortionless Response (MVDR) and Generalized Eigenvalue (GEV) beamformers. We also derive the Phase Aware Normalization (PAN) postfilter, which corrects both magnitude and phase distortions caused by the GEV. Further, we demonstrate the properties of our eigenvector features, and compare their performance with three state-of-the-art reference systems. We report their performance in terms of SNR improvement and Word Error Rate (WER) using Google and Kaldi Speech-to-Text API. Experiments are performed on the WSJ0 and CHiME4 corpora, where competitive performance in both WER and SNR is achieved.

Original language	English
Pages (from-to)	2162 - 2172
Journal	IEEE/ACM Transactions on Audio Speech and Language Processing
Volume	27
Issue number	12
DOIs	https://doi.org/10.1109/TASLP.2019.2941592
Publication status	Published - 2019

Access to Document

10.1109/TASLP.2019.2941592

Cite this

@article{f85755f9c20c45fdb07be8be9a87337c,

title = "Eigenvector-based Speech Mask Estimation for Multi- Channel Speech Enhancement",

abstract = "We present the Eigennet architecture for estimating a gain mask from noisy, multi-channel microphone observations. While existing mask estimators use magnitude features, our system also exploits the spatial information embedded in the phase of the data. The mask is used to obtain the Minimum Variance Distortionless Response (MVDR) and Generalized Eigenvalue (GEV) beamformers. We also derive the Phase Aware Normalization (PAN) postfilter, which corrects both magnitude and phase distortions caused by the GEV. Further, we demonstrate the properties of our eigenvector features, and compare their performance with three state-of-the-art reference systems. We report their performance in terms of SNR improvement and Word Error Rate (WER) using Google and Kaldi Speech-to-Text API. Experiments are performed on the WSJ0 and CHiME4 corpora, where competitive performance in both WER and SNR is achieved.",

author = "Lukas Pfeifenberger and Matthias Z{\"o}hrer and Franz Pernkopf",

year = "2019",

doi = "10.1109/TASLP.2019.2941592",

language = "English",

volume = "27",

pages = "2162 -- 2172",

journal = "IEEE/ACM Transactions on Audio Speech and Language Processing",

issn = "2329-9290",

publisher = "Institute of Electrical and Electronics Engineers",

number = "12",

}

TY - JOUR

T1 - Eigenvector-based Speech Mask Estimation for Multi- Channel Speech Enhancement

AU - Pfeifenberger, Lukas

AU - Zöhrer, Matthias

AU - Pernkopf, Franz

PY - 2019

Y1 - 2019

N2 - We present the Eigennet architecture for estimating a gain mask from noisy, multi-channel microphone observations. While existing mask estimators use magnitude features, our system also exploits the spatial information embedded in the phase of the data. The mask is used to obtain the Minimum Variance Distortionless Response (MVDR) and Generalized Eigenvalue (GEV) beamformers. We also derive the Phase Aware Normalization (PAN) postfilter, which corrects both magnitude and phase distortions caused by the GEV. Further, we demonstrate the properties of our eigenvector features, and compare their performance with three state-of-the-art reference systems. We report their performance in terms of SNR improvement and Word Error Rate (WER) using Google and Kaldi Speech-to-Text API. Experiments are performed on the WSJ0 and CHiME4 corpora, where competitive performance in both WER and SNR is achieved.

AB - We present the Eigennet architecture for estimating a gain mask from noisy, multi-channel microphone observations. While existing mask estimators use magnitude features, our system also exploits the spatial information embedded in the phase of the data. The mask is used to obtain the Minimum Variance Distortionless Response (MVDR) and Generalized Eigenvalue (GEV) beamformers. We also derive the Phase Aware Normalization (PAN) postfilter, which corrects both magnitude and phase distortions caused by the GEV. Further, we demonstrate the properties of our eigenvector features, and compare their performance with three state-of-the-art reference systems. We report their performance in terms of SNR improvement and Word Error Rate (WER) using Google and Kaldi Speech-to-Text API. Experiments are performed on the WSJ0 and CHiME4 corpora, where competitive performance in both WER and SNR is achieved.

U2 - 10.1109/TASLP.2019.2941592

DO - 10.1109/TASLP.2019.2941592

M3 - Article

SN - 2329-9290

VL - 27

SP - 2162

EP - 2172

JO - IEEE/ACM Transactions on Audio Speech and Language Processing

JF - IEEE/ACM Transactions on Audio Speech and Language Processing

IS - 12

ER -

Eigenvector-based Speech Mask Estimation for Multi- Channel Speech Enhancement

Abstract

Access to Document

Fingerprint

Cite this