Unsupervised single channel speech separation based on optimized subspace separation

Belhedi Wiem*, Ben Messaoud Mohamed anouar, Pejman Mowlaee, Bouzid Aicha

*Korrespondierende/r Autor/-in für diese Arbeit

Publikation: Beitrag in einer FachzeitschriftArtikelBegutachtung


Single channel speech separation (SCSS) is widely used in many real-time applications such as preprocessing stage for speech recognition to control humanoid robots and in hearing aid. The performance of the separation is crucial for these applications. In this paper, we propose a new approach for unsupervised SCSS. The separation relies on an optimization of the subspace separation by decomposing the mixed signal into three estimates which are namely; the sparse subspace, the sub-sparse subspace and the low-rank subspace. Soft mask is used in the core of the proposed approach for the final decision. The proposed system generates two separated signals of different qualities and provided in two different channels. The channel classification is done using Fuzzy logic which requires two parameters. The first parameter is the quality of separated signal that we determine using a nonintrusive metric for speech quality and intelligibility. The second parameter is the gender of the speaker, determined using a proposed F0 tracking algorithm. The evaluation results of the proposed approach are reported and compared to other state-of-art approaches. The proposed method on average achieves 67.9% improvement in PESQ, 59.5% improvement in signal-to-interference ratio (SIR) and 10.5% improvement in the target-related perceptual score (TPS) versus the benchmark methods.

Seiten (von - bis)93-101
FachzeitschriftSpeech Communication
PublikationsstatusVeröffentlicht - 1 Feb. 2018

ASJC Scopus subject areas

  • Software
  • Modellierung und Simulation
  • Kommunikation
  • Sprache und Linguistik
  • Linguistik und Sprache
  • Maschinelles Sehen und Mustererkennung
  • Angewandte Informatik


Untersuchen Sie die Forschungsthemen von „Unsupervised single channel speech separation based on optimized subspace separation“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren