A Pitch-Synchronous Simultaneous Detection-Estimation Framework for Speech Enhancement

Johannes Stahl; Pejman  Mowlaee

doi:10.1109/TASLP.2017.2779405

A Pitch-Synchronous Simultaneous Detection-Estimation Framework for Speech Enhancement

Johannes Stahl, Pejman Mowlaee

Institute of Signal Processing and Speech Communication (4420)

Research output: Contribution to journal › Article › peer-review

Abstract

Speech enhancement methods formulated in the short-time Fourier transform (STFT) domain vary in the statistical assumptions made on the STFT coefficients, in the optimization criteria applied or in the models of the signal components. Recently, approaches relying on a stochastic-deterministic speech model have been proposed. The deterministic part of the signal corresponds to harmonically related sinusoids, often used to represent voiced speech. The stochastic part models signal components that are not captured by the deterministic components. In this paper, we consider this scenario under a new perspective yielding three main contributions. First, a pitch-synchronous signal representation is considered and shown to be advantageous for the estimation of the harmonic model parameters. Second, we model the harmonic amplitudes in voiced speech as random variables with frequency bin dependent Gamma distributions. Finally, distinct estimators for the different models of voiced speech, unvoiced speech, and speech absence are derived. To select from the arising estimates, we take into account the mutual impact of detection and estimation by proposing a binary decision framework that is derived from a Bayesian risk function. The resulting pitch-synchronous stochastic-deterministic estimator outperforms several benchmark methods in terms of speech intelligibility and perceived quality predicted by instrumental measures for various noise types and different signal-to-noise ratios.

Original language	English
Pages (from-to)	436-450
Number of pages	15
Journal	IEEE/ACM Transactions on Audio Speech and Language Processing
Volume	26
Issue number	2
DOIs	https://doi.org/10.1109/TASLP.2017.2779405
Publication status	Published - 4 Dec 2017

Access to Document

10.1109/TASLP.2017.2779405

FWF - Phase - Phase-Aware Signal Processing for Speech Transmission
Mowlaee Beikzadehmahaleh, P.
1/10/15 → 31/07/19
Project: Research project

Cite this

@article{975fe2ba60824d64a897f3472e0ffd4b,

title = "A Pitch-Synchronous Simultaneous Detection-Estimation Framework for Speech Enhancement",

abstract = "Speech enhancement methods formulated in the short-time Fourier transform (STFT) domain vary in the statistical assumptions made on the STFT coefficients, in the optimization criteria applied or in the models of the signal components. Recently, approaches relying on a stochastic-deterministic speech model have been proposed. The deterministic part of the signal corresponds to harmonically related sinusoids, often used to represent voiced speech. The stochastic part models signal components that are not captured by the deterministic components. In this paper, we consider this scenario under a new perspective yielding three main contributions. First, a pitch-synchronous signal representation is considered and shown to be advantageous for the estimation of the harmonic model parameters. Second, we model the harmonic amplitudes in voiced speech as random variables with frequency bin dependent Gamma distributions. Finally, distinct estimators for the different models of voiced speech, unvoiced speech, and speech absence are derived. To select from the arising estimates, we take into account the mutual impact of detection and estimation by proposing a binary decision framework that is derived from a Bayesian risk function. The resulting pitch-synchronous stochastic-deterministic estimator outperforms several benchmark methods in terms of speech intelligibility and perceived quality predicted by instrumental measures for various noise types and different signal-to-noise ratios.",

author = "Johannes Stahl and Pejman Mowlaee",

year = "2017",

month = dec,

day = "4",

doi = "10.1109/TASLP.2017.2779405",

language = "English",

volume = "26",

pages = "436--450",

journal = "IEEE/ACM Transactions on Audio Speech and Language Processing",

issn = "2329-9290",

publisher = "Institute of Electrical and Electronics Engineers",

number = "2",

}

TY - JOUR

T1 - A Pitch-Synchronous Simultaneous Detection-Estimation Framework for Speech Enhancement

AU - Stahl, Johannes

AU - Mowlaee, Pejman

PY - 2017/12/4

Y1 - 2017/12/4

N2 - Speech enhancement methods formulated in the short-time Fourier transform (STFT) domain vary in the statistical assumptions made on the STFT coefficients, in the optimization criteria applied or in the models of the signal components. Recently, approaches relying on a stochastic-deterministic speech model have been proposed. The deterministic part of the signal corresponds to harmonically related sinusoids, often used to represent voiced speech. The stochastic part models signal components that are not captured by the deterministic components. In this paper, we consider this scenario under a new perspective yielding three main contributions. First, a pitch-synchronous signal representation is considered and shown to be advantageous for the estimation of the harmonic model parameters. Second, we model the harmonic amplitudes in voiced speech as random variables with frequency bin dependent Gamma distributions. Finally, distinct estimators for the different models of voiced speech, unvoiced speech, and speech absence are derived. To select from the arising estimates, we take into account the mutual impact of detection and estimation by proposing a binary decision framework that is derived from a Bayesian risk function. The resulting pitch-synchronous stochastic-deterministic estimator outperforms several benchmark methods in terms of speech intelligibility and perceived quality predicted by instrumental measures for various noise types and different signal-to-noise ratios.

AB - Speech enhancement methods formulated in the short-time Fourier transform (STFT) domain vary in the statistical assumptions made on the STFT coefficients, in the optimization criteria applied or in the models of the signal components. Recently, approaches relying on a stochastic-deterministic speech model have been proposed. The deterministic part of the signal corresponds to harmonically related sinusoids, often used to represent voiced speech. The stochastic part models signal components that are not captured by the deterministic components. In this paper, we consider this scenario under a new perspective yielding three main contributions. First, a pitch-synchronous signal representation is considered and shown to be advantageous for the estimation of the harmonic model parameters. Second, we model the harmonic amplitudes in voiced speech as random variables with frequency bin dependent Gamma distributions. Finally, distinct estimators for the different models of voiced speech, unvoiced speech, and speech absence are derived. To select from the arising estimates, we take into account the mutual impact of detection and estimation by proposing a binary decision framework that is derived from a Bayesian risk function. The resulting pitch-synchronous stochastic-deterministic estimator outperforms several benchmark methods in terms of speech intelligibility and perceived quality predicted by instrumental measures for various noise types and different signal-to-noise ratios.

U2 - 10.1109/TASLP.2017.2779405

DO - 10.1109/TASLP.2017.2779405

M3 - Article

SN - 2329-9290

VL - 26

SP - 436

EP - 450

JO - IEEE/ACM Transactions on Audio Speech and Language Processing

JF - IEEE/ACM Transactions on Audio Speech and Language Processing

IS - 2

ER -

A Pitch-Synchronous Simultaneous Detection-Estimation Framework for Speech Enhancement

Abstract

Access to Document

Fingerprint

Projects

FWF - Phase - Phase-Aware Signal Processing for Speech Transmission

Cite this