Blind Speech Separation and Dereverberation using Neural Beamforming

Lukas Pfeifenberger, Franz Pernkopf

Publikation: Beitrag in einer FachzeitschriftArtikelBegutachtung

Abstract

In this paper, we present the Blind Speech Separation and Dereverberation (BSSD) network, which performs simultaneous speaker separation, dereverberation and speaker identification in a single neural network. Speaker separation is guided by a set of predefined spatial cues. Dereverberation is performed by using neural beamforming, and speaker identification is aided by embedding vectors and triplet mining. We introduce a frequency-domain model which uses complex-valued neural networks, and a time-domain variant which performs beamforming in latent space. Further, we propose a block-online mode to process longer audio recordings, as they occur in meeting scenarios. We evaluate our system in terms of Scale Independent Signal to Distortion Ratio (SI-SDR), Word Error Rate (WER) and Equal Error Rate (EER).
Originalspracheenglisch
Seiten (von - bis)29-41
Seitenumfang13
FachzeitschriftSpeech Communication
Jahrgang140
DOIs
PublikationsstatusVeröffentlicht - Mai 2022

ASJC Scopus subject areas

  • Software
  • Kommunikation
  • Sprache und Linguistik
  • Maschinelles Sehen und Mustererkennung
  • Angewandte Informatik
  • Modellierung und Simulation
  • Linguistik und Sprache

Fields of Expertise

  • Information, Communication & Computing
  • Intelligent Systems

    Pernkopf, F.

    1/01/02 → …

    Projekt: Arbeitsgebiet

Dieses zitieren