TY - GEN
T1 - Towards Real-Time Single-Channel Singing-Voice Separation with Pruned Multi-Scaled DenseNets
AU - Huber, Markus
AU - Schindler, Günther
AU - Roth, Wolfgang
AU - Fröning, Holger
AU - Schörkhuber, Christian
AU - Pernkopf, Franz
PY - 2020/5
Y1 - 2020/5
N2 - Modern musical source separation systems based on deep neural networks reach unprecedented levels of separation quality. However, harnessing the power of these large-scale models in typical audio production environments, which frequently offer only limited computing resources while demanding real-time processing, remains challenging. We extend the multi-scaled DenseNet in several aspects to facilitate real-time source separation scenarios. Specifically, we reduce the computational requirements by inferring Mel-scaled masks and decrease the model size via effective use of bottleneck layers, while improving performance using a deep clustering objective. In addition, we are able to further increase the model efficiency by applying parameterized structured pruning of convolutional weights without any significant impact on the separation performance. We significantly reduce the model size and increase the computational efficiency by a factor of 1.6 and 4.3, respectively, while maintaining the separation performance.
AB - Modern musical source separation systems based on deep neural networks reach unprecedented levels of separation quality. However, harnessing the power of these large-scale models in typical audio production environments, which frequently offer only limited computing resources while demanding real-time processing, remains challenging. We extend the multi-scaled DenseNet in several aspects to facilitate real-time source separation scenarios. Specifically, we reduce the computational requirements by inferring Mel-scaled masks and decrease the model size via effective use of bottleneck layers, while improving performance using a deep clustering objective. In addition, we are able to further increase the model efficiency by applying parameterized structured pruning of convolutional weights without any significant impact on the separation performance. We significantly reduce the model size and increase the computational efficiency by a factor of 1.6 and 4.3, respectively, while maintaining the separation performance.
KW - Multi-scaled DenseNet
KW - Musical Source Separation
KW - Parameterized Structured Pruning
KW - Real-time
UR - http://www.scopus.com/inward/record.url?scp=85089211719&partnerID=8YFLogxK
U2 - 10.1109/ICASSP40776.2020.9053542
DO - 10.1109/ICASSP40776.2020.9053542
M3 - Conference paper
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 806
EP - 810
BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
T2 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing
Y2 - 4 May 2020 through 8 May 2020
ER -