Towards Real-Time Single-Channel Singing-Voice Separation with Pruned Multi-Scaled DenseNets

Markus Huber; Günther Schindler; Wolfgang Roth; Holger Fröning; Christian Schörkhuber; Franz Pernkopf

doi:10.1109/ICASSP40776.2020.9053542

Towards Real-Time Single-Channel Singing-Voice Separation with Pruned Multi-Scaled DenseNets

Markus Huber, Günther Schindler, Wolfgang Roth, Holger Fröning, Christian Schörkhuber, Franz Pernkopf

Institut für Signalverarbeitung und Sprachkommunikation (4420)

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Abstract

Modern musical source separation systems based on deep neural networks reach unprecedented levels of separation quality. However, harnessing the power of these large-scale models in typical audio production environments, which frequently offer only limited computing resources while demanding real-time processing, remains challenging. We extend the multi-scaled DenseNet in several aspects to facilitate real-time source separation scenarios. Specifically, we reduce the computational requirements by inferring Mel-scaled masks and decrease the model size via effective use of bottleneck layers, while improving performance using a deep clustering objective. In addition, we are able to further increase the model efficiency by applying parameterized structured pruning of convolutional weights without any significant impact on the separation performance. We significantly reduce the model size and increase the computational efficiency by a factor of 1.6 and 4.3, respectively, while maintaining the separation performance.

Originalsprache	englisch
Titel	2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
Seiten	806-810
Seitenumfang	5
ISBN (elektronisch)	9781509066315
DOIs	https://doi.org/10.1109/ICASSP40776.2020.9053542
Publikationsstatus	Veröffentlicht - Mai 2020
Veranstaltung	2020 IEEE International Conference on Acoustics, Speech and Signal Processing: ICASSP 2020 - Virtuell, Barcelona, Spanien Dauer: 4 Mai 2020 → 8 Mai 2020

Publikationsreihe

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Band	2020-May
ISSN (Print)	1520-6149

Konferenz

Konferenz	2020 IEEE International Conference on Acoustics, Speech and Signal Processing
Kurztitel	ICASSP 2020
Land/Gebiet	Spanien
Ort	Virtuell, Barcelona
Zeitraum	4/05/20 → 8/05/20

ASJC Scopus subject areas

Software
Signalverarbeitung
Elektrotechnik und Elektronik

Zugriff auf Dokument

10.1109/ICASSP40776.2020.9053542

Andere Dateien und Links

http://www.scopus.com/inward/record.url?scp=85089211719&partnerID=8YFLogxK

Dieses zitieren

Huber, M., Schindler, G., Roth, W., Fröning, H., Schörkhuber, C., & Pernkopf, F. (2020). Towards Real-Time Single-Channel Singing-Voice Separation with Pruned Multi-Scaled DenseNets. in 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings (S. 806-810). Artikel 9053542 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Band 2020-May). https://doi.org/10.1109/ICASSP40776.2020.9053542

Towards Real-Time Single-Channel Singing-Voice Separation with Pruned Multi-Scaled DenseNets. / Huber, Markus; Schindler, Günther; Roth, Wolfgang et al.
2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. 2020. S. 806-810 9053542 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Band 2020-May).

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Huber, M, Schindler, G, Roth, W, Fröning, H, Schörkhuber, C & Pernkopf, F 2020, Towards Real-Time Single-Channel Singing-Voice Separation with Pruned Multi-Scaled DenseNets. in 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings., 9053542, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Bd. 2020-May, S. 806-810, 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Virtuell, Barcelona, Spanien, 4/05/20. https://doi.org/10.1109/ICASSP40776.2020.9053542

Huber M, Schindler G, Roth W, Fröning H, Schörkhuber C, Pernkopf F. Towards Real-Time Single-Channel Singing-Voice Separation with Pruned Multi-Scaled DenseNets. in 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. 2020. S. 806-810. 9053542. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP40776.2020.9053542

Huber, Markus ; Schindler, Günther ; Roth, Wolfgang et al. / Towards Real-Time Single-Channel Singing-Voice Separation with Pruned Multi-Scaled DenseNets. 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. 2020. S. 806-810 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{79b2f4d47b3f4dac98e403b1d0cdae4f,

title = "Towards Real-Time Single-Channel Singing-Voice Separation with Pruned Multi-Scaled DenseNets",

abstract = "Modern musical source separation systems based on deep neural networks reach unprecedented levels of separation quality. However, harnessing the power of these large-scale models in typical audio production environments, which frequently offer only limited computing resources while demanding real-time processing, remains challenging. We extend the multi-scaled DenseNet in several aspects to facilitate real-time source separation scenarios. Specifically, we reduce the computational requirements by inferring Mel-scaled masks and decrease the model size via effective use of bottleneck layers, while improving performance using a deep clustering objective. In addition, we are able to further increase the model efficiency by applying parameterized structured pruning of convolutional weights without any significant impact on the separation performance. We significantly reduce the model size and increase the computational efficiency by a factor of 1.6 and 4.3, respectively, while maintaining the separation performance.",

keywords = "Multi-scaled DenseNet, Musical Source Separation, Parameterized Structured Pruning, Real-time",

author = "Markus Huber and G{\"u}nther Schindler and Wolfgang Roth and Holger Fr{\"o}ning and Christian Sch{\"o}rkhuber and Franz Pernkopf",

year = "2020",

month = may,

doi = "10.1109/ICASSP40776.2020.9053542",

language = "English",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

pages = "806--810",

booktitle = "2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings",

note = "2020 IEEE International Conference on Acoustics, Speech and Signal Processing : ICASSP 2020, ICASSP 2020 ; Conference date: 04-05-2020 Through 08-05-2020",

}

TY - GEN

T1 - Towards Real-Time Single-Channel Singing-Voice Separation with Pruned Multi-Scaled DenseNets

AU - Huber, Markus

AU - Schindler, Günther

AU - Roth, Wolfgang

AU - Fröning, Holger

AU - Schörkhuber, Christian

AU - Pernkopf, Franz

PY - 2020/5

Y1 - 2020/5

N2 - Modern musical source separation systems based on deep neural networks reach unprecedented levels of separation quality. However, harnessing the power of these large-scale models in typical audio production environments, which frequently offer only limited computing resources while demanding real-time processing, remains challenging. We extend the multi-scaled DenseNet in several aspects to facilitate real-time source separation scenarios. Specifically, we reduce the computational requirements by inferring Mel-scaled masks and decrease the model size via effective use of bottleneck layers, while improving performance using a deep clustering objective. In addition, we are able to further increase the model efficiency by applying parameterized structured pruning of convolutional weights without any significant impact on the separation performance. We significantly reduce the model size and increase the computational efficiency by a factor of 1.6 and 4.3, respectively, while maintaining the separation performance.

AB - Modern musical source separation systems based on deep neural networks reach unprecedented levels of separation quality. However, harnessing the power of these large-scale models in typical audio production environments, which frequently offer only limited computing resources while demanding real-time processing, remains challenging. We extend the multi-scaled DenseNet in several aspects to facilitate real-time source separation scenarios. Specifically, we reduce the computational requirements by inferring Mel-scaled masks and decrease the model size via effective use of bottleneck layers, while improving performance using a deep clustering objective. In addition, we are able to further increase the model efficiency by applying parameterized structured pruning of convolutional weights without any significant impact on the separation performance. We significantly reduce the model size and increase the computational efficiency by a factor of 1.6 and 4.3, respectively, while maintaining the separation performance.

KW - Multi-scaled DenseNet

KW - Musical Source Separation

KW - Parameterized Structured Pruning

KW - Real-time

UR - http://www.scopus.com/inward/record.url?scp=85089211719&partnerID=8YFLogxK

U2 - 10.1109/ICASSP40776.2020.9053542

DO - 10.1109/ICASSP40776.2020.9053542

M3 - Conference paper

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 806

EP - 810

BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings

T2 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing

Y2 - 4 May 2020 through 8 May 2020

ER -

Towards Real-Time Single-Channel Singing-Voice Separation with Pruned Multi-Scaled DenseNets

Abstract

Publikationsreihe

Konferenz

ASJC Scopus subject areas

Zugriff auf Dokument

Andere Dateien und Links

Fingerprint

Dieses zitieren