Skip to main navigation Skip to search Skip to main content

WaveNet Factorization with Singular Value Decomposition for Voice Conversion

  • Hongqiang Du
  • , Xiaohai Tian
  • , Lei Xie
  • , Haizhou Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

WaveNet vocoder has seen its great advantage over traditional vocoders in voice quality. However, it usually requires a relatively large amount of speech data to train a speaker-dependent WaveNet vocoder. Therefore, it remains a challenge to build a high-quality WaveNet vocoder for low resource tasks, e.g. voice conversion, where speech samples are limited in real applications. We propose to use singular value decomposition (SVD) to reduce WaveNet parameters while maintaining its output voice quality. Specifically, we apply SVD on dilated convolution layers, and impose semi-orthogonal constraint to improve the performance. Experiments conducted on CMU-ARCTIC database show that as compared with the original WaveNet vocoder, the proposed method maintains similar performance, in terms of both quality and similarity, while using much less training data.

Original languageEnglish
Title of host publication2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages152-159
Number of pages8
ISBN (Electronic)9781728103068
DOIs
StatePublished - Dec 2019
Event2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Singapore, Singapore
Duration: 15 Dec 201918 Dec 2019

Publication series

Name2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings

Conference

Conference2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019
Country/TerritorySingapore
CitySingapore
Period15/12/1918/12/19

Keywords

  • Singular Value Decomposition (SVD)
  • Voice Conversion (VC)
  • WaveNet

Fingerprint

Dive into the research topics of 'WaveNet Factorization with Singular Value Decomposition for Voice Conversion'. Together they form a unique fingerprint.

Cite this