Unsupervised single channel speech separation based on optimized subspace separation

Belhedi Wiem; Ben Messaoud Mohamed anouar; Pejman Mowlaee; Bouzid Aicha

doi:10.1016/j.specom.2017.11.010

Unsupervised single channel speech separation based on optimized subspace separation

Belhedi Wiem^*, Ben Messaoud Mohamed anouar, Pejman Mowlaee, Bouzid Aicha

^*Corresponding author for this work

Institute of Signal Processing and Speech Communication (4420)

Research output: Contribution to journal › Article › peer-review

Abstract

Single channel speech separation (SCSS) is widely used in many real-time applications such as preprocessing stage for speech recognition to control humanoid robots and in hearing aid. The performance of the separation is crucial for these applications. In this paper, we propose a new approach for unsupervised SCSS. The separation relies on an optimization of the subspace separation by decomposing the mixed signal into three estimates which are namely; the sparse subspace, the sub-sparse subspace and the low-rank subspace. Soft mask is used in the core of the proposed approach for the final decision. The proposed system generates two separated signals of different qualities and provided in two different channels. The channel classification is done using Fuzzy logic which requires two parameters. The first parameter is the quality of separated signal that we determine using a nonintrusive metric for speech quality and intelligibility. The second parameter is the gender of the speaker, determined using a proposed F₀ tracking algorithm. The evaluation results of the proposed approach are reported and compared to other state-of-art approaches. The proposed method on average achieves 67.9% improvement in PESQ, 59.5% improvement in signal-to-interference ratio (SIR) and 10.5% improvement in the target-related perceptual score (TPS) versus the benchmark methods.

Original language	English
Pages (from-to)	93-101
Number of pages	9
Journal	Speech Communication
Volume	96
DOIs	https://doi.org/10.1016/j.specom.2017.11.010
Publication status	Published - 1 Feb 2018

Keywords

Fuzzy logic
Multi-scale product
Nonintrusive metric for speech quality and intelligibility
Optimized subspace decomposition
Soft mask
Unsupervised SCSS
Wavelet Transform

ASJC Scopus subject areas

Software
Modelling and Simulation
Communication
Language and Linguistics
Linguistics and Language
Computer Vision and Pattern Recognition
Computer Science Applications

Access to Document

10.1016/j.specom.2017.11.010

Cite this

@article{c4b3635b2d084d92b4dc8f8eb21489a0,

title = "Unsupervised single channel speech separation based on optimized subspace separation",

abstract = "Single channel speech separation (SCSS) is widely used in many real-time applications such as preprocessing stage for speech recognition to control humanoid robots and in hearing aid. The performance of the separation is crucial for these applications. In this paper, we propose a new approach for unsupervised SCSS. The separation relies on an optimization of the subspace separation by decomposing the mixed signal into three estimates which are namely; the sparse subspace, the sub-sparse subspace and the low-rank subspace. Soft mask is used in the core of the proposed approach for the final decision. The proposed system generates two separated signals of different qualities and provided in two different channels. The channel classification is done using Fuzzy logic which requires two parameters. The first parameter is the quality of separated signal that we determine using a nonintrusive metric for speech quality and intelligibility. The second parameter is the gender of the speaker, determined using a proposed F0 tracking algorithm. The evaluation results of the proposed approach are reported and compared to other state-of-art approaches. The proposed method on average achieves 67.9% improvement in PESQ, 59.5% improvement in signal-to-interference ratio (SIR) and 10.5% improvement in the target-related perceptual score (TPS) versus the benchmark methods.",

keywords = "Fuzzy logic, Multi-scale product, Nonintrusive metric for speech quality and intelligibility, Optimized subspace decomposition, Soft mask, Unsupervised SCSS, Wavelet Transform",

author = "Belhedi Wiem and {Mohamed anouar}, {Ben Messaoud} and Pejman Mowlaee and Bouzid Aicha",

year = "2018",

month = feb,

day = "1",

doi = "10.1016/j.specom.2017.11.010",

language = "English",

volume = "96",

pages = "93--101",

journal = "Speech Communication",

issn = "0167-6393",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Unsupervised single channel speech separation based on optimized subspace separation

AU - Wiem, Belhedi

AU - Mohamed anouar, Ben Messaoud

AU - Mowlaee, Pejman

AU - Aicha, Bouzid

PY - 2018/2/1

Y1 - 2018/2/1

N2 - Single channel speech separation (SCSS) is widely used in many real-time applications such as preprocessing stage for speech recognition to control humanoid robots and in hearing aid. The performance of the separation is crucial for these applications. In this paper, we propose a new approach for unsupervised SCSS. The separation relies on an optimization of the subspace separation by decomposing the mixed signal into three estimates which are namely; the sparse subspace, the sub-sparse subspace and the low-rank subspace. Soft mask is used in the core of the proposed approach for the final decision. The proposed system generates two separated signals of different qualities and provided in two different channels. The channel classification is done using Fuzzy logic which requires two parameters. The first parameter is the quality of separated signal that we determine using a nonintrusive metric for speech quality and intelligibility. The second parameter is the gender of the speaker, determined using a proposed F0 tracking algorithm. The evaluation results of the proposed approach are reported and compared to other state-of-art approaches. The proposed method on average achieves 67.9% improvement in PESQ, 59.5% improvement in signal-to-interference ratio (SIR) and 10.5% improvement in the target-related perceptual score (TPS) versus the benchmark methods.

AB - Single channel speech separation (SCSS) is widely used in many real-time applications such as preprocessing stage for speech recognition to control humanoid robots and in hearing aid. The performance of the separation is crucial for these applications. In this paper, we propose a new approach for unsupervised SCSS. The separation relies on an optimization of the subspace separation by decomposing the mixed signal into three estimates which are namely; the sparse subspace, the sub-sparse subspace and the low-rank subspace. Soft mask is used in the core of the proposed approach for the final decision. The proposed system generates two separated signals of different qualities and provided in two different channels. The channel classification is done using Fuzzy logic which requires two parameters. The first parameter is the quality of separated signal that we determine using a nonintrusive metric for speech quality and intelligibility. The second parameter is the gender of the speaker, determined using a proposed F0 tracking algorithm. The evaluation results of the proposed approach are reported and compared to other state-of-art approaches. The proposed method on average achieves 67.9% improvement in PESQ, 59.5% improvement in signal-to-interference ratio (SIR) and 10.5% improvement in the target-related perceptual score (TPS) versus the benchmark methods.

KW - Fuzzy logic

KW - Multi-scale product

KW - Nonintrusive metric for speech quality and intelligibility

KW - Optimized subspace decomposition

KW - Soft mask

KW - Unsupervised SCSS

KW - Wavelet Transform

UR - http://www.scopus.com/inward/record.url?scp=85037045630&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2017.11.010

DO - 10.1016/j.specom.2017.11.010

M3 - Article

AN - SCOPUS:85037045630

SN - 0167-6393

VL - 96

SP - 93

EP - 101

JO - Speech Communication

JF - Speech Communication

ER -

Unsupervised single channel speech separation based on optimized subspace separation

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this