Denoising recurrent neural network for deep bidirectional LSTM based voice conversion

Jie Wu, Dongyan Huang, Lei Xie, Haizhou Li

Research output: Contribution to journalConference articlepeer-review

8 Scopus citations

Abstract

The paper studies the post processing in deep bidirectional Long Short-Term Memory (DBLSTM) based voice conversion, where the statistical parameters are optimized to generate speech that exhibits similar properties to target speech. However, there always exists residual error between converted speech and target one. We reformulate the residual error problem as speech restoration, which aims to recover the target speech samples from the converted ones. Specifically, we propose a denoising recurrent neural network (DeRNN) by introducing regularization during training to shape the distribution of the converted data in latent space. We compare the proposed approach with global variance (GV), modulation spectrum (MS) and recurrent neural network (RNN) based postfilters, which serve a similar purpose. The subjective test results show that the proposed approach significantly outperforms these conventional approaches in terms of quality and similarity.

Original languageEnglish
Pages (from-to)3379-3383
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2017-August
DOIs
StatePublished - 2017
Event18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
Duration: 20 Aug 201724 Aug 2017

Keywords

  • Denoising
  • Gaussian noise
  • Recurrent neural network
  • Residual error
  • Voice conversion

Fingerprint

Dive into the research topics of 'Denoising recurrent neural network for deep bidirectional LSTM based voice conversion'. Together they form a unique fingerprint.

Cite this