Overcoming Covariance Matrix Phase Sensitivity in Single-Channel Speech Enhancement with Correlated Spectral Components

Johannes Stahl, Sean Ulrich Niethe Wood, Pejman Mowlaee Beikzadehmahaleh

Publikation: Beitrag in Buch/Bericht/KonferenzbandBeitrag in einem KonferenzbandForschungBegutachtung

Abstract

The single-channel speech enhancement problem in the short-time Fourier transform domain is addressed. Traditional approaches assume statistical independence between signal components from different frequency regions, resulting in estimators that are functions of diagonal covariance matrices. More recent approaches drop this assumption and explicitly model dependencies between discrete Fourier transform bins. Full covariance matrices of speech and noise are required in this case to obtain optimal estimates of the clean speech spectrum, where off-diagonal entries are complex-valued in general. We show that the performance of estimators resulting from such models is highly sensitive to the phase estimation accuracy of these off-diagonal entries. Since it is non-trivial to estimate the covariance phases from noisy speech data, we propose a linear multidimensional short-time spectral amplitude estimator that circumvents the need to estimate them. We evaluate the speech enhancement performance of this approach and compare it to relevant benchmarks that also take into account inter-channel dependencies.
Originalspracheenglisch
TitelITG-Fb. 282: Speech Communication
Herausgeber (Verlag)VDE
Seiten286-290
Seitenumfang5
PublikationsstatusVeröffentlicht - 2018
Veranstaltung13th ITG Conference on Speech Communication - Oldenburg, Deutschland
Dauer: 10 Okt 201812 Okt 2018

Konferenz

Konferenz13th ITG Conference on Speech Communication
LandDeutschland
OrtOldenburg
Zeitraum10/10/1812/10/18

Fingerprint

Speech enhancement
Covariance matrix
Bins
Discrete Fourier transforms
Fourier transforms

Dies zitieren

Overcoming Covariance Matrix Phase Sensitivity in Single-Channel Speech Enhancement with Correlated Spectral Components. / Stahl, Johannes; Wood, Sean Ulrich Niethe; Mowlaee Beikzadehmahaleh, Pejman.

ITG-Fb. 282: Speech Communication. VDE, 2018. S. 286-290.

Publikation: Beitrag in Buch/Bericht/KonferenzbandBeitrag in einem KonferenzbandForschungBegutachtung

Stahl, J, Wood, SUN & Mowlaee Beikzadehmahaleh, P 2018, Overcoming Covariance Matrix Phase Sensitivity in Single-Channel Speech Enhancement with Correlated Spectral Components. in ITG-Fb. 282: Speech Communication. VDE, S. 286-290, Oldenburg, Deutschland, 10/10/18.
Stahl, Johannes ; Wood, Sean Ulrich Niethe ; Mowlaee Beikzadehmahaleh, Pejman. / Overcoming Covariance Matrix Phase Sensitivity in Single-Channel Speech Enhancement with Correlated Spectral Components. ITG-Fb. 282: Speech Communication. VDE, 2018. S. 286-290
@inproceedings{55ab3862ceaf4c99b5bc443812e62cf7,
title = "Overcoming Covariance Matrix Phase Sensitivity in Single-Channel Speech Enhancement with Correlated Spectral Components",
abstract = "The single-channel speech enhancement problem in the short-time Fourier transform domain is addressed. Traditional approaches assume statistical independence between signal components from different frequency regions, resulting in estimators that are functions of diagonal covariance matrices. More recent approaches drop this assumption and explicitly model dependencies between discrete Fourier transform bins. Full covariance matrices of speech and noise are required in this case to obtain optimal estimates of the clean speech spectrum, where off-diagonal entries are complex-valued in general. We show that the performance of estimators resulting from such models is highly sensitive to the phase estimation accuracy of these off-diagonal entries. Since it is non-trivial to estimate the covariance phases from noisy speech data, we propose a linear multidimensional short-time spectral amplitude estimator that circumvents the need to estimate them. We evaluate the speech enhancement performance of this approach and compare it to relevant benchmarks that also take into account inter-channel dependencies.",
author = "Johannes Stahl and Wood, {Sean Ulrich Niethe} and {Mowlaee Beikzadehmahaleh}, Pejman",
year = "2018",
language = "English",
pages = "286--290",
booktitle = "ITG-Fb. 282: Speech Communication",
publisher = "VDE",

}

TY - GEN

T1 - Overcoming Covariance Matrix Phase Sensitivity in Single-Channel Speech Enhancement with Correlated Spectral Components

AU - Stahl, Johannes

AU - Wood, Sean Ulrich Niethe

AU - Mowlaee Beikzadehmahaleh, Pejman

PY - 2018

Y1 - 2018

N2 - The single-channel speech enhancement problem in the short-time Fourier transform domain is addressed. Traditional approaches assume statistical independence between signal components from different frequency regions, resulting in estimators that are functions of diagonal covariance matrices. More recent approaches drop this assumption and explicitly model dependencies between discrete Fourier transform bins. Full covariance matrices of speech and noise are required in this case to obtain optimal estimates of the clean speech spectrum, where off-diagonal entries are complex-valued in general. We show that the performance of estimators resulting from such models is highly sensitive to the phase estimation accuracy of these off-diagonal entries. Since it is non-trivial to estimate the covariance phases from noisy speech data, we propose a linear multidimensional short-time spectral amplitude estimator that circumvents the need to estimate them. We evaluate the speech enhancement performance of this approach and compare it to relevant benchmarks that also take into account inter-channel dependencies.

AB - The single-channel speech enhancement problem in the short-time Fourier transform domain is addressed. Traditional approaches assume statistical independence between signal components from different frequency regions, resulting in estimators that are functions of diagonal covariance matrices. More recent approaches drop this assumption and explicitly model dependencies between discrete Fourier transform bins. Full covariance matrices of speech and noise are required in this case to obtain optimal estimates of the clean speech spectrum, where off-diagonal entries are complex-valued in general. We show that the performance of estimators resulting from such models is highly sensitive to the phase estimation accuracy of these off-diagonal entries. Since it is non-trivial to estimate the covariance phases from noisy speech data, we propose a linear multidimensional short-time spectral amplitude estimator that circumvents the need to estimate them. We evaluate the speech enhancement performance of this approach and compare it to relevant benchmarks that also take into account inter-channel dependencies.

M3 - Conference contribution

SP - 286

EP - 290

BT - ITG-Fb. 282: Speech Communication

PB - VDE

ER -