Unsupervised single channel speech separation based on optimized subspace separation

Belhedi Wiem, Ben Messaoud Mohamed anouar, Pejman Mowlaee, Bouzid Aicha

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Single channel speech separation (SCSS) is widely used in many real-time applications such as preprocessing stage for speech recognition to control humanoid robots and in hearing aid. The performance of the separation is crucial for these applications. In this paper, we propose a new approach for unsupervised SCSS. The separation relies on an optimization of the subspace separation by decomposing the mixed signal into three estimates which are namely; the sparse subspace, the sub-sparse subspace and the low-rank subspace. Soft mask is used in the core of the proposed approach for the final decision. The proposed system generates two separated signals of different qualities and provided in two different channels. The channel classification is done using Fuzzy logic which requires two parameters. The first parameter is the quality of separated signal that we determine using a nonintrusive metric for speech quality and intelligibility. The second parameter is the gender of the speaker, determined using a proposed F0 tracking algorithm. The evaluation results of the proposed approach are reported and compared to other state-of-art approaches. The proposed method on average achieves 67.9% improvement in PESQ, 59.5% improvement in signal-to-interference ratio (SIR) and 10.5% improvement in the target-related perceptual score (TPS) versus the benchmark methods.

Original languageEnglish
Pages (from-to)93-101
Number of pages9
JournalSpeech Communication
Volume96
DOIs
Publication statusPublished - 1 Feb 2018

Fingerprint

Subspace
Speech intelligibility
Hearing aids
logic
robot
Humanoid Robot
interference
Speech Recognition
Speech recognition
Fuzzy Logic
Fuzzy logic
Mask
Preprocessing
Two Parameters
Masks
Interference
Speech
Robots
gender
Benchmark

Keywords

  • Fuzzy logic
  • Multi-scale product
  • Nonintrusive metric for speech quality and intelligibility
  • Optimized subspace decomposition
  • Soft mask
  • Unsupervised SCSS
  • Wavelet Transform

ASJC Scopus subject areas

  • Software
  • Modelling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

Unsupervised single channel speech separation based on optimized subspace separation. / Wiem, Belhedi; Mohamed anouar, Ben Messaoud; Mowlaee, Pejman; Aicha, Bouzid.

In: Speech Communication, Vol. 96, 01.02.2018, p. 93-101.

Research output: Contribution to journalArticleResearchpeer-review

Wiem, Belhedi ; Mohamed anouar, Ben Messaoud ; Mowlaee, Pejman ; Aicha, Bouzid. / Unsupervised single channel speech separation based on optimized subspace separation. In: Speech Communication. 2018 ; Vol. 96. pp. 93-101.
@article{c4b3635b2d084d92b4dc8f8eb21489a0,
title = "Unsupervised single channel speech separation based on optimized subspace separation",
abstract = "Single channel speech separation (SCSS) is widely used in many real-time applications such as preprocessing stage for speech recognition to control humanoid robots and in hearing aid. The performance of the separation is crucial for these applications. In this paper, we propose a new approach for unsupervised SCSS. The separation relies on an optimization of the subspace separation by decomposing the mixed signal into three estimates which are namely; the sparse subspace, the sub-sparse subspace and the low-rank subspace. Soft mask is used in the core of the proposed approach for the final decision. The proposed system generates two separated signals of different qualities and provided in two different channels. The channel classification is done using Fuzzy logic which requires two parameters. The first parameter is the quality of separated signal that we determine using a nonintrusive metric for speech quality and intelligibility. The second parameter is the gender of the speaker, determined using a proposed F0 tracking algorithm. The evaluation results of the proposed approach are reported and compared to other state-of-art approaches. The proposed method on average achieves 67.9{\%} improvement in PESQ, 59.5{\%} improvement in signal-to-interference ratio (SIR) and 10.5{\%} improvement in the target-related perceptual score (TPS) versus the benchmark methods.",
keywords = "Fuzzy logic, Multi-scale product, Nonintrusive metric for speech quality and intelligibility, Optimized subspace decomposition, Soft mask, Unsupervised SCSS, Wavelet Transform",
author = "Belhedi Wiem and {Mohamed anouar}, {Ben Messaoud} and Pejman Mowlaee and Bouzid Aicha",
year = "2018",
month = "2",
day = "1",
doi = "10.1016/j.specom.2017.11.010",
language = "English",
volume = "96",
pages = "93--101",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Unsupervised single channel speech separation based on optimized subspace separation

AU - Wiem, Belhedi

AU - Mohamed anouar, Ben Messaoud

AU - Mowlaee, Pejman

AU - Aicha, Bouzid

PY - 2018/2/1

Y1 - 2018/2/1

N2 - Single channel speech separation (SCSS) is widely used in many real-time applications such as preprocessing stage for speech recognition to control humanoid robots and in hearing aid. The performance of the separation is crucial for these applications. In this paper, we propose a new approach for unsupervised SCSS. The separation relies on an optimization of the subspace separation by decomposing the mixed signal into three estimates which are namely; the sparse subspace, the sub-sparse subspace and the low-rank subspace. Soft mask is used in the core of the proposed approach for the final decision. The proposed system generates two separated signals of different qualities and provided in two different channels. The channel classification is done using Fuzzy logic which requires two parameters. The first parameter is the quality of separated signal that we determine using a nonintrusive metric for speech quality and intelligibility. The second parameter is the gender of the speaker, determined using a proposed F0 tracking algorithm. The evaluation results of the proposed approach are reported and compared to other state-of-art approaches. The proposed method on average achieves 67.9% improvement in PESQ, 59.5% improvement in signal-to-interference ratio (SIR) and 10.5% improvement in the target-related perceptual score (TPS) versus the benchmark methods.

AB - Single channel speech separation (SCSS) is widely used in many real-time applications such as preprocessing stage for speech recognition to control humanoid robots and in hearing aid. The performance of the separation is crucial for these applications. In this paper, we propose a new approach for unsupervised SCSS. The separation relies on an optimization of the subspace separation by decomposing the mixed signal into three estimates which are namely; the sparse subspace, the sub-sparse subspace and the low-rank subspace. Soft mask is used in the core of the proposed approach for the final decision. The proposed system generates two separated signals of different qualities and provided in two different channels. The channel classification is done using Fuzzy logic which requires two parameters. The first parameter is the quality of separated signal that we determine using a nonintrusive metric for speech quality and intelligibility. The second parameter is the gender of the speaker, determined using a proposed F0 tracking algorithm. The evaluation results of the proposed approach are reported and compared to other state-of-art approaches. The proposed method on average achieves 67.9% improvement in PESQ, 59.5% improvement in signal-to-interference ratio (SIR) and 10.5% improvement in the target-related perceptual score (TPS) versus the benchmark methods.

KW - Fuzzy logic

KW - Multi-scale product

KW - Nonintrusive metric for speech quality and intelligibility

KW - Optimized subspace decomposition

KW - Soft mask

KW - Unsupervised SCSS

KW - Wavelet Transform

UR - http://www.scopus.com/inward/record.url?scp=85037045630&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2017.11.010

DO - 10.1016/j.specom.2017.11.010

M3 - Article

VL - 96

SP - 93

EP - 101

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

ER -