Blind Speech Separation and Dereverberation using Neural Beamforming

Lukas Pfeifenberger, Franz Pernkopf

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, we present the Blind Speech Separation and Dereverberation (BSSD) network, which performs simultaneous speaker separation, dereverberation and speaker identification in a single neural network. Speaker separation is guided by a set of predefined spatial cues. Dereverberation is performed by using neural beamforming, and speaker identification is aided by embedding vectors and triplet mining. We introduce a frequency-domain model which uses complex-valued neural networks, and a time-domain variant which performs beamforming in latent space. Further, we propose a block-online mode to process longer audio recordings, as they occur in meeting scenarios. We evaluate our system in terms of Scale Independent Signal to Distortion Ratio (SI-SDR), Word Error Rate (WER) and Equal Error Rate (EER).
Original languageEnglish
Pages (from-to)29-41
Number of pages13
JournalSpeech Communication
Volume140
DOIs
Publication statusPublished - May 2022

Keywords

  • Beamforming
  • Dereverberation
  • Multi-channel speaker separation
  • Speaker identification
  • Triplet mining

ASJC Scopus subject areas

  • Software
  • Communication
  • Language and Linguistics
  • Computer Vision and Pattern Recognition
  • Computer Science Applications
  • Modelling and Simulation
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Blind Speech Separation and Dereverberation using Neural Beamforming'. Together they form a unique fingerprint.

Cite this