跳到主要导航 跳到搜索 跳到主要内容

Factorized WaveNet for voice conversion with limited data

  • Hongqiang Du
  • , Xiaohai Tian
  • , Lei Xie
  • , Haizhou Li

科研成果: 期刊稿件文章同行评审

7 引用 (Scopus)

摘要

WaveNet is introduced for waveform generation. It produces high quality text-to-speech synthesis, music generation, and voice conversion. However, it generally requires a large amount of training data, that limits its scope of applications, e.g. in voice conversion. In this paper, we propose a factorized WaveNet for limited data tasks. Specifically, we apply singular value decomposition (SVD) on the dilated convolution layers of WaveNet to reduce the number of parameters. By doing so, we reduce the data requirement for WaveNet training, while maintaining similar network performance. We use voice conversion as a case study to validate the proposed idea. Two sets of experiments are conducted, where WaveNet is used as a vocoder and an integrated converter–vocoder respectively. Experiments on CMU-ARCTIC and CSTR-VCTK corpora show that factorized WaveNet consistently outperforms its original WaveNet counterpart when using the same amount of training data. We also apply SVD similarly to real-time neural vocoder Parallel WaveGAN for voice conversion, and observe similar improvement.

源语言英语
页(从-至)45-54
页数10
期刊Speech Communication
130
DOI
出版状态已出版 - 6月 2021

指纹

探究 'Factorized WaveNet for voice conversion with limited data' 的科研主题。它们共同构成独一无二的指纹。

引用此