MBTFNET: Multi-Band Temporal-Frequency Neural Network for Singing Voice Enhancement

Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang Zhou, Weifeng Zhao, Lei Xie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios where singing is often mixed with vocal-correlated accompanies and singing has substantial differences from speaking. Music source separation (MSS) models treat vocals and various accompaniment components equally, which may reduce performance compared to the model that only considers vocal enhancement. In this paper, we propose a novel multi-band temporal-frequency neural network (MBTFNet) for singing voice enhancement, which particularly removes background music, noise and even backing vocals from singing recordings. MBTFNet combines inter and intra-band modeling for better processing of full-band signals. Dual-path modeling in the temporal and frequency axis and temporal dilation blocks are introduced to expand the receptive field of the model. Particularly for removing backing vocals, we propose an implicit personalized enhancement (IPE) stage based on signal-to-noise ratio (SNR) estimation, which further improves the performance of MBTFNet. Experiments show that our proposed model significantly outperforms several state-of-the-art SE and MSS models.

Original languageEnglish
Title of host publication2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350306897
DOIs
StatePublished - 2023
Event2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023 - Taipei, Taiwan, Province of China
Duration: 16 Dec 202320 Dec 2023

Publication series

Name2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

Conference

Conference2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
Country/TerritoryTaiwan, Province of China
CityTaipei
Period16/12/2320/12/23

Keywords

  • implicit personalized enhancement
  • MBTFNet
  • singing-voice enhancement

Fingerprint

Dive into the research topics of 'MBTFNET: Multi-Band Temporal-Frequency Neural Network for Singing Voice Enhancement'. Together they form a unique fingerprint.

Cite this