Preserving Background Sound in Noise-Robust Voice Conversion Via Multi-Task Learning

  • Jixun Yao
  • , Yi Lei
  • , Qing Wang
  • , Pengcheng Guo
  • , Ziqian Ning
  • , Lei Xie
  • , Hai Li
  • , Junhui Liu
  • , Danming Xie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Background sound is an informative form of art that is helpful in providing a more immersive experience in real-application voice conversion (VC) scenarios. However, prior research about VC, mainly focusing on clean voices, pay rare attention to VC with background sound. The critical problem for preserving background sound in VC is inevitable speech distortion by the neural separation model and the cascade mismatch between the source separation model and the VC model. In this paper, we propose an end-to-end framework via multitask learning which sequentially cascades a source separation (SS) module, a bottleneck feature extraction module and a VC module. Specifically, the source separation task explicitly considers critical phase information and limits the distortion caused by the imperfect separation process. The source separation task, the typical VC task and the unified task share a uniform reconstruction loss constrained by joint training to reduce the mismatch between the SS and VC modules. Experimental results demonstrate that our proposed framework significantly outperforms the baseline systems while achieving comparable quality and speaker similarity to the VC models trained with clean data.

Original languageEnglish
Title of host publicationICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728163277
DOIs
StatePublished - 2023
Event48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece
Duration: 4 Jun 202310 Jun 2023

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2023-June
ISSN (Print)1520-6149

Conference

Conference48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Country/TerritoryGreece
CityRhodes Island
Period4/06/2310/06/23

Keywords

  • Voice conversion
  • background sound
  • end-to-end
  • multi-task learning

Fingerprint

Dive into the research topics of 'Preserving Background Sound in Noise-Robust Voice Conversion Via Multi-Task Learning'. Together they form a unique fingerprint.

Cite this