摘要
The paper studies the post processing in deep bidirectional Long Short-Term Memory (DBLSTM) based voice conversion, where the statistical parameters are optimized to generate speech that exhibits similar properties to target speech. However, there always exists residual error between converted speech and target one. We reformulate the residual error problem as speech restoration, which aims to recover the target speech samples from the converted ones. Specifically, we propose a denoising recurrent neural network (DeRNN) by introducing regularization during training to shape the distribution of the converted data in latent space. We compare the proposed approach with global variance (GV), modulation spectrum (MS) and recurrent neural network (RNN) based postfilters, which serve a similar purpose. The subjective test results show that the proposed approach significantly outperforms these conventional approaches in terms of quality and similarity.
| 源语言 | 英语 |
|---|---|
| 页(从-至) | 3379-3383 |
| 页数 | 5 |
| 期刊 | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| 卷 | 2017-August |
| DOI | |
| 出版状态 | 已出版 - 2017 |
| 活动 | 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, 瑞典 期限: 20 8月 2017 → 24 8月 2017 |
指纹
探究 'Denoising recurrent neural network for deep bidirectional LSTM based voice conversion' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver