Exploiting Temporal Correlation in Pitch-Adaptive Speech Enhancement

Publikation: Beitrag in einer FachzeitschriftArtikelForschungBegutachtung

Abstract

The single-channel speech enhancement problem is addressed. We propose a pitch-adaptive short-time Fourier transform (PASTFT) framework to obtain a signal-dependent time-frequency representation of the input signal. We analyze the inter-frame correlation of successive speech DFT bins resulting from the PASTFT and harmonic signal modeling. This analysis reveals significant correlation if the phase progression introduced by the harmonic nature of the speech signal is taken into account. Hence, we model successive speech DFT bins as complex-valued autoregressive processes and propose to incorporate the harmonic phase progression into a state-transition model. We estimate the corresponding model parameters by exploiting circular statistics and assume that the additive noise DFT coefficients are uncorrelated w.r.t. time. Based on this propagation model, we propose a pitch-adaptive complex-valued Kalman filter for speech enhancement. The effectiveness of the proposed speech enhancement method is demonstrated in terms of instrumental speech quality and intelligibility predictors. The results indicate a good balance between speech distortions and preservation of speech intelligibility of the input signal compared to the benchmark methods.
Originalspracheenglisch
Seiten (von - bis)1-13
Seitenumfang13
FachzeitschriftSpeech Communication
Jahrgang111
DOIs
PublikationsstatusVeröffentlicht - 2019

Fingerprint

Speech Enhancement
Speech enhancement
Temporal Correlation
Short-time Fourier Transform
Discrete Fourier transforms
Harmonic
Speech intelligibility
Progression
Bins
Speech Intelligibility
Fourier transforms
Transition Model
Speech Signal
Autoregressive Process
Additive Noise
State Transition
Preservation
Kalman Filter
Additive noise
Predictors

Schlagwörter

    ASJC Scopus subject areas

    • Software
    • Kommunikation
    • Sprache und Linguistik
    • !!Computer Vision and Pattern Recognition
    • !!Computer Science Applications
    • !!Modelling and Simulation
    • Linguistik und Sprache

    Dies zitieren

    Exploiting Temporal Correlation in Pitch-Adaptive Speech Enhancement. / Stahl, Johannes; Mowlaee Beikzadehmahaleh, Pejman.

    in: Speech Communication, Jahrgang 111, 2019, S. 1-13.

    Publikation: Beitrag in einer FachzeitschriftArtikelForschungBegutachtung

    @article{11a148684f864f989fe163b915ab57f7,
    title = "Exploiting Temporal Correlation in Pitch-Adaptive Speech Enhancement",
    abstract = "The single-channel speech enhancement problem is addressed. We propose a pitch-adaptive short-time Fourier transform (PASTFT) framework to obtain a signal-dependent time-frequency representation of the input signal. We analyze the inter-frame correlation of successive speech DFT bins resulting from the PASTFT and harmonic signal modeling. This analysis reveals significant correlation if the phase progression introduced by the harmonic nature of the speech signal is taken into account. Hence, we model successive speech DFT bins as complex-valued autoregressive processes and propose to incorporate the harmonic phase progression into a state-transition model. We estimate the corresponding model parameters by exploiting circular statistics and assume that the additive noise DFT coefficients are uncorrelated w.r.t. time. Based on this propagation model, we propose a pitch-adaptive complex-valued Kalman filter for speech enhancement. The effectiveness of the proposed speech enhancement method is demonstrated in terms of instrumental speech quality and intelligibility predictors. The results indicate a good balance between speech distortions and preservation of speech intelligibility of the input signal compared to the benchmark methods.",
    keywords = "Circular statistics, Kalman filter, Pitch-adaptive, Speech enhancement",
    author = "Johannes Stahl and {Mowlaee Beikzadehmahaleh}, Pejman",
    year = "2019",
    doi = "10.1016/j.specom.2019.05.001",
    language = "English",
    volume = "111",
    pages = "1--13",
    journal = "Speech Communication",
    issn = "0167-6393",
    publisher = "Elsevier B.V.",

    }

    TY - JOUR

    T1 - Exploiting Temporal Correlation in Pitch-Adaptive Speech Enhancement

    AU - Stahl, Johannes

    AU - Mowlaee Beikzadehmahaleh, Pejman

    PY - 2019

    Y1 - 2019

    N2 - The single-channel speech enhancement problem is addressed. We propose a pitch-adaptive short-time Fourier transform (PASTFT) framework to obtain a signal-dependent time-frequency representation of the input signal. We analyze the inter-frame correlation of successive speech DFT bins resulting from the PASTFT and harmonic signal modeling. This analysis reveals significant correlation if the phase progression introduced by the harmonic nature of the speech signal is taken into account. Hence, we model successive speech DFT bins as complex-valued autoregressive processes and propose to incorporate the harmonic phase progression into a state-transition model. We estimate the corresponding model parameters by exploiting circular statistics and assume that the additive noise DFT coefficients are uncorrelated w.r.t. time. Based on this propagation model, we propose a pitch-adaptive complex-valued Kalman filter for speech enhancement. The effectiveness of the proposed speech enhancement method is demonstrated in terms of instrumental speech quality and intelligibility predictors. The results indicate a good balance between speech distortions and preservation of speech intelligibility of the input signal compared to the benchmark methods.

    AB - The single-channel speech enhancement problem is addressed. We propose a pitch-adaptive short-time Fourier transform (PASTFT) framework to obtain a signal-dependent time-frequency representation of the input signal. We analyze the inter-frame correlation of successive speech DFT bins resulting from the PASTFT and harmonic signal modeling. This analysis reveals significant correlation if the phase progression introduced by the harmonic nature of the speech signal is taken into account. Hence, we model successive speech DFT bins as complex-valued autoregressive processes and propose to incorporate the harmonic phase progression into a state-transition model. We estimate the corresponding model parameters by exploiting circular statistics and assume that the additive noise DFT coefficients are uncorrelated w.r.t. time. Based on this propagation model, we propose a pitch-adaptive complex-valued Kalman filter for speech enhancement. The effectiveness of the proposed speech enhancement method is demonstrated in terms of instrumental speech quality and intelligibility predictors. The results indicate a good balance between speech distortions and preservation of speech intelligibility of the input signal compared to the benchmark methods.

    KW - Circular statistics

    KW - Kalman filter

    KW - Pitch-adaptive

    KW - Speech enhancement

    UR - http://www.scopus.com/inward/record.url?scp=85065916623&partnerID=8YFLogxK

    U2 - 10.1016/j.specom.2019.05.001

    DO - 10.1016/j.specom.2019.05.001

    M3 - Article

    VL - 111

    SP - 1

    EP - 13

    JO - Speech Communication

    JF - Speech Communication

    SN - 0167-6393

    ER -