Projects per year
Abstract
In this paper, we present the Blind Speech Separation and Dereverberation (BSSD) network, which performs simultaneous speaker separation, dereverberation and speaker identification in a single neural network. Speaker separation is guided by a set of predefined spatial cues. Dereverberation is performed by using neural beamforming, and speaker identification is aided by embedding vectors and triplet mining. We introduce a frequency-domain model which uses complex-valued neural networks, and a time-domain variant which performs beamforming in latent space. Further, we propose a block-online mode to process longer audio recordings, as they occur in meeting scenarios. We evaluate our system in terms of Scale Independent Signal to Distortion Ratio (SI-SDR), Word Error Rate (WER) and Equal Error Rate (EER).
Original language | English |
---|---|
Pages (from-to) | 29-41 |
Number of pages | 13 |
Journal | Speech Communication |
Volume | 140 |
DOIs | |
Publication status | Published - May 2022 |
Keywords
- Beamforming
- Dereverberation
- Multi-channel speaker separation
- Speaker identification
- Triplet mining
ASJC Scopus subject areas
- Software
- Communication
- Language and Linguistics
- Computer Vision and Pattern Recognition
- Computer Science Applications
- Modelling and Simulation
- Linguistics and Language
Fields of Expertise
- Information, Communication & Computing
Projects
- 1 Active