Exploiting Temporal Correlation in Pitch-Adaptive Speech Enhancement

Research output: Contribution to journalArticleResearchpeer-review

Abstract

The single-channel speech enhancement problem is addressed. We propose a pitch-adaptive short-time Fourier transform (PASTFT) framework to obtain a signal-dependent time-frequency representation of the input signal. We analyze the inter-frame correlation of successive speech DFT bins resulting from the PASTFT and harmonic signal modeling. This analysis reveals significant correlation if the phase progression introduced by the harmonic nature of the speech signal is taken into account. Hence, we model successive speech DFT bins as complex-valued autoregressive processes and propose to incorporate the harmonic phase progression into a state-transition model. We estimate the corresponding model parameters by exploiting circular statistics and assume that the additive noise DFT coefficients are uncorrelated w.r.t. time. Based on this propagation model, we propose a pitch-adaptive complex-valued Kalman filter for speech enhancement. The effectiveness of the proposed speech enhancement method is demonstrated in terms of instrumental speech quality and intelligibility predictors. The results indicate a good balance between speech distortions and preservation of speech intelligibility of the input signal compared to the benchmark methods.
Original languageEnglish
Pages (from-to)1-13
Number of pages13
JournalSpeech Communication
Volume111
DOIs
Publication statusPublished - 2019

Fingerprint

Speech Enhancement
Speech enhancement
Temporal Correlation
Short-time Fourier Transform
Discrete Fourier transforms
Harmonic
Speech intelligibility
Progression
Bins
Speech Intelligibility
Fourier transforms
Transition Model
Speech Signal
Autoregressive Process
Additive Noise
State Transition
Preservation
Kalman Filter
Additive noise
Predictors

Keywords

  • Circular statistics
  • Kalman filter
  • Pitch-adaptive
  • Speech enhancement

ASJC Scopus subject areas

  • Software
  • Communication
  • Language and Linguistics
  • Computer Vision and Pattern Recognition
  • Computer Science Applications
  • Modelling and Simulation
  • Linguistics and Language

Cite this

Exploiting Temporal Correlation in Pitch-Adaptive Speech Enhancement. / Stahl, Johannes; Mowlaee Beikzadehmahaleh, Pejman.

In: Speech Communication, Vol. 111, 2019, p. 1-13.

Research output: Contribution to journalArticleResearchpeer-review

@article{11a148684f864f989fe163b915ab57f7,
title = "Exploiting Temporal Correlation in Pitch-Adaptive Speech Enhancement",
abstract = "The single-channel speech enhancement problem is addressed. We propose a pitch-adaptive short-time Fourier transform (PASTFT) framework to obtain a signal-dependent time-frequency representation of the input signal. We analyze the inter-frame correlation of successive speech DFT bins resulting from the PASTFT and harmonic signal modeling. This analysis reveals significant correlation if the phase progression introduced by the harmonic nature of the speech signal is taken into account. Hence, we model successive speech DFT bins as complex-valued autoregressive processes and propose to incorporate the harmonic phase progression into a state-transition model. We estimate the corresponding model parameters by exploiting circular statistics and assume that the additive noise DFT coefficients are uncorrelated w.r.t. time. Based on this propagation model, we propose a pitch-adaptive complex-valued Kalman filter for speech enhancement. The effectiveness of the proposed speech enhancement method is demonstrated in terms of instrumental speech quality and intelligibility predictors. The results indicate a good balance between speech distortions and preservation of speech intelligibility of the input signal compared to the benchmark methods.",
keywords = "Circular statistics, Kalman filter, Pitch-adaptive, Speech enhancement",
author = "Johannes Stahl and {Mowlaee Beikzadehmahaleh}, Pejman",
year = "2019",
doi = "10.1016/j.specom.2019.05.001",
language = "English",
volume = "111",
pages = "1--13",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Exploiting Temporal Correlation in Pitch-Adaptive Speech Enhancement

AU - Stahl, Johannes

AU - Mowlaee Beikzadehmahaleh, Pejman

PY - 2019

Y1 - 2019

N2 - The single-channel speech enhancement problem is addressed. We propose a pitch-adaptive short-time Fourier transform (PASTFT) framework to obtain a signal-dependent time-frequency representation of the input signal. We analyze the inter-frame correlation of successive speech DFT bins resulting from the PASTFT and harmonic signal modeling. This analysis reveals significant correlation if the phase progression introduced by the harmonic nature of the speech signal is taken into account. Hence, we model successive speech DFT bins as complex-valued autoregressive processes and propose to incorporate the harmonic phase progression into a state-transition model. We estimate the corresponding model parameters by exploiting circular statistics and assume that the additive noise DFT coefficients are uncorrelated w.r.t. time. Based on this propagation model, we propose a pitch-adaptive complex-valued Kalman filter for speech enhancement. The effectiveness of the proposed speech enhancement method is demonstrated in terms of instrumental speech quality and intelligibility predictors. The results indicate a good balance between speech distortions and preservation of speech intelligibility of the input signal compared to the benchmark methods.

AB - The single-channel speech enhancement problem is addressed. We propose a pitch-adaptive short-time Fourier transform (PASTFT) framework to obtain a signal-dependent time-frequency representation of the input signal. We analyze the inter-frame correlation of successive speech DFT bins resulting from the PASTFT and harmonic signal modeling. This analysis reveals significant correlation if the phase progression introduced by the harmonic nature of the speech signal is taken into account. Hence, we model successive speech DFT bins as complex-valued autoregressive processes and propose to incorporate the harmonic phase progression into a state-transition model. We estimate the corresponding model parameters by exploiting circular statistics and assume that the additive noise DFT coefficients are uncorrelated w.r.t. time. Based on this propagation model, we propose a pitch-adaptive complex-valued Kalman filter for speech enhancement. The effectiveness of the proposed speech enhancement method is demonstrated in terms of instrumental speech quality and intelligibility predictors. The results indicate a good balance between speech distortions and preservation of speech intelligibility of the input signal compared to the benchmark methods.

KW - Circular statistics

KW - Kalman filter

KW - Pitch-adaptive

KW - Speech enhancement

UR - http://www.scopus.com/inward/record.url?scp=85065916623&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2019.05.001

DO - 10.1016/j.specom.2019.05.001

M3 - Article

VL - 111

SP - 1

EP - 13

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

ER -