Denoising recurrent neural network for deep bidirectional LSTM based voice conversion

Jie Wu, Dongyan Huang, Lei Xie, Haizhou Li

科研成果: 期刊稿件会议文章同行评审

8 引用 (Scopus)

摘要

The paper studies the post processing in deep bidirectional Long Short-Term Memory (DBLSTM) based voice conversion, where the statistical parameters are optimized to generate speech that exhibits similar properties to target speech. However, there always exists residual error between converted speech and target one. We reformulate the residual error problem as speech restoration, which aims to recover the target speech samples from the converted ones. Specifically, we propose a denoising recurrent neural network (DeRNN) by introducing regularization during training to shape the distribution of the converted data in latent space. We compare the proposed approach with global variance (GV), modulation spectrum (MS) and recurrent neural network (RNN) based postfilters, which serve a similar purpose. The subjective test results show that the proposed approach significantly outperforms these conventional approaches in terms of quality and similarity.

源语言英语
页(从-至)3379-3383
页数5
期刊Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2017-August
DOI
出版状态已出版 - 2017
活动18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, 瑞典
期限: 20 8月 201724 8月 2017

指纹

探究 'Denoising recurrent neural network for deep bidirectional LSTM based voice conversion' 的科研主题。它们共同构成独一无二的指纹。

引用此