跳到主要导航 跳到搜索 跳到主要内容

Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers

  • Liumeng Xue
  • , Shan Yang
  • , Na Hu
  • , Dan Su
  • , Lei Xie
  • Northwestern Polytechnical University Xian
  • Tencent

科研成果: 期刊稿件会议文章同行评审

3 引用 (Scopus)

摘要

Building a voice conversion system for noisy target speakers, such as users providing noisy samples or Internet found data, is a challenging task since the use of contaminated speech in model training will apparently degrade the conversion performance. In this paper, we leverage the advances of our recently proposed Glow-WaveGAN [1] and propose a noise-independent speech representation learning approach for high-quality voice conversion for noisy target speakers. Specifically, we learn a latent feature space where we ensure that the target distribution modeled by the conversion model is exactly from the modeled distribution of the waveform generator. With this premise, we further manage to make the latent feature to be noise-invariant. Specifically, we introduce a noise-controllable WaveGAN, which directly learns the noise-independent acoustic representation from waveform by the encoder and conducts noise control in the hidden space through a FiLM [2] module in the decoder. As for the conversion model, importantly, we use a flow-based model to learn the distribution of noise-independent but speaker-related latent features from phoneme posteriorgrams. Experimental results demonstrate that the proposed model achieves high speech quality and speaker similarity in the voice conversion for noisy target speakers.

源语言英语
页(从-至)2548-2552
页数5
期刊Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2022-September
DOI
出版状态已出版 - 2022
活动23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, 韩国
期限: 18 9月 202222 9月 2022

指纹

探究 'Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers' 的科研主题。它们共同构成独一无二的指纹。

引用此