Binaural Codebook-Based Speech Enhancement with Atomic Speech Presence Probability

Sean U.N. Wood; Johannes K.W. Stahl; Pejman Mowlaee

doi:10.1109/TASLP.2019.2937174

Binaural Codebook-Based Speech Enhancement with Atomic Speech Presence Probability

Sean U.N. Wood^*, Johannes K.W. Stahl, Pejman Mowlaee

^*Corresponding author for this work

Institute of Signal Processing and Speech Communication (4420)

Research output: Contribution to journal › Article › peer-review

Abstract

In this work, we present a universal codebook-based speech enhancement framework that relies on a single codebook to encode both speech and noise components. The atomic speech presence probability (ASPP) is defined as the probability that a given codebook atom encodes speech at a given point in time. We develop ASPP estimators based on binaural cues including the interaural phase and level difference (IPD and ILD), the interaural coherence magnitude (ICM), as well as a combined version leveraging the full interaural transfer function (ITF). We evaluate the performance of the resulting ASPP-based speech enhancement algorithms on binaural mixtures of reverberant speech and real-world noise. The proposed approach improves both objective speech quality and intelligibility over a wide range of input SNR, as measured with PESQ and binaural STOI metrics, outperforming two binaural speech enhancement benchmark methods. We show that the proposed ITF-based ASPP approach achieves a good balance of the trade-off between binaural noise reduction and binaural cue preservation.

Original language	English
Article number	8811601
Pages (from-to)	2150-2161
Number of pages	12
Journal	IEEE/ACM Transactions on Audio Speech and Language Processing
Volume	27
Issue number	12
DOIs	https://doi.org/10.1109/TASLP.2019.2937174
Publication status	Published - 1 Dec 2019

Keywords

atomic speech presence probability
Binaural speech enhancement
interaural transfer function
nonnegative matrix factorization

ASJC Scopus subject areas

Computer Science (miscellaneous)
Acoustics and Ultrasonics
Computational Mathematics
Electrical and Electronic Engineering

Access to Document

10.1109/TASLP.2019.2937174

Cite this

@article{4e20658fe926480d916358696e02c05a,

title = "Binaural Codebook-Based Speech Enhancement with Atomic Speech Presence Probability",

abstract = "In this work, we present a universal codebook-based speech enhancement framework that relies on a single codebook to encode both speech and noise components. The atomic speech presence probability (ASPP) is defined as the probability that a given codebook atom encodes speech at a given point in time. We develop ASPP estimators based on binaural cues including the interaural phase and level difference (IPD and ILD), the interaural coherence magnitude (ICM), as well as a combined version leveraging the full interaural transfer function (ITF). We evaluate the performance of the resulting ASPP-based speech enhancement algorithms on binaural mixtures of reverberant speech and real-world noise. The proposed approach improves both objective speech quality and intelligibility over a wide range of input SNR, as measured with PESQ and binaural STOI metrics, outperforming two binaural speech enhancement benchmark methods. We show that the proposed ITF-based ASPP approach achieves a good balance of the trade-off between binaural noise reduction and binaural cue preservation.",

keywords = "atomic speech presence probability, Binaural speech enhancement, interaural transfer function, nonnegative matrix factorization",

author = "Wood, {Sean U.N.} and Stahl, {Johannes K.W.} and Pejman Mowlaee",

year = "2019",

month = dec,

day = "1",

doi = "10.1109/TASLP.2019.2937174",

language = "English",

volume = "27",

pages = "2150--2161",

journal = "IEEE/ACM Transactions on Audio Speech and Language Processing",

issn = "2329-9290",

publisher = "Institute of Electrical and Electronics Engineers",

number = "12",

}

TY - JOUR

T1 - Binaural Codebook-Based Speech Enhancement with Atomic Speech Presence Probability

AU - Wood, Sean U.N.

AU - Stahl, Johannes K.W.

AU - Mowlaee, Pejman

PY - 2019/12/1

Y1 - 2019/12/1

N2 - In this work, we present a universal codebook-based speech enhancement framework that relies on a single codebook to encode both speech and noise components. The atomic speech presence probability (ASPP) is defined as the probability that a given codebook atom encodes speech at a given point in time. We develop ASPP estimators based on binaural cues including the interaural phase and level difference (IPD and ILD), the interaural coherence magnitude (ICM), as well as a combined version leveraging the full interaural transfer function (ITF). We evaluate the performance of the resulting ASPP-based speech enhancement algorithms on binaural mixtures of reverberant speech and real-world noise. The proposed approach improves both objective speech quality and intelligibility over a wide range of input SNR, as measured with PESQ and binaural STOI metrics, outperforming two binaural speech enhancement benchmark methods. We show that the proposed ITF-based ASPP approach achieves a good balance of the trade-off between binaural noise reduction and binaural cue preservation.

AB - In this work, we present a universal codebook-based speech enhancement framework that relies on a single codebook to encode both speech and noise components. The atomic speech presence probability (ASPP) is defined as the probability that a given codebook atom encodes speech at a given point in time. We develop ASPP estimators based on binaural cues including the interaural phase and level difference (IPD and ILD), the interaural coherence magnitude (ICM), as well as a combined version leveraging the full interaural transfer function (ITF). We evaluate the performance of the resulting ASPP-based speech enhancement algorithms on binaural mixtures of reverberant speech and real-world noise. The proposed approach improves both objective speech quality and intelligibility over a wide range of input SNR, as measured with PESQ and binaural STOI metrics, outperforming two binaural speech enhancement benchmark methods. We show that the proposed ITF-based ASPP approach achieves a good balance of the trade-off between binaural noise reduction and binaural cue preservation.

KW - atomic speech presence probability

KW - Binaural speech enhancement

KW - interaural transfer function

KW - nonnegative matrix factorization

UR - http://www.scopus.com/inward/record.url?scp=85071672659&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2019.2937174

DO - 10.1109/TASLP.2019.2937174

M3 - Article

AN - SCOPUS:85071672659

SN - 2329-9290

VL - 27

SP - 2150

EP - 2161

JO - IEEE/ACM Transactions on Audio Speech and Language Processing

JF - IEEE/ACM Transactions on Audio Speech and Language Processing

IS - 12

M1 - 8811601

ER -

Binaural Codebook-Based Speech Enhancement with Atomic Speech Presence Probability

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this