TY - JOUR
T1 - Denoising recurrent neural network for deep bidirectional LSTM based voice conversion
AU - Wu, Jie
AU - Huang, Dongyan
AU - Xie, Lei
AU - Li, Haizhou
N1 - Publisher Copyright:
Copyright © 2017 ISCA.
PY - 2017
Y1 - 2017
N2 - The paper studies the post processing in deep bidirectional Long Short-Term Memory (DBLSTM) based voice conversion, where the statistical parameters are optimized to generate speech that exhibits similar properties to target speech. However, there always exists residual error between converted speech and target one. We reformulate the residual error problem as speech restoration, which aims to recover the target speech samples from the converted ones. Specifically, we propose a denoising recurrent neural network (DeRNN) by introducing regularization during training to shape the distribution of the converted data in latent space. We compare the proposed approach with global variance (GV), modulation spectrum (MS) and recurrent neural network (RNN) based postfilters, which serve a similar purpose. The subjective test results show that the proposed approach significantly outperforms these conventional approaches in terms of quality and similarity.
AB - The paper studies the post processing in deep bidirectional Long Short-Term Memory (DBLSTM) based voice conversion, where the statistical parameters are optimized to generate speech that exhibits similar properties to target speech. However, there always exists residual error between converted speech and target one. We reformulate the residual error problem as speech restoration, which aims to recover the target speech samples from the converted ones. Specifically, we propose a denoising recurrent neural network (DeRNN) by introducing regularization during training to shape the distribution of the converted data in latent space. We compare the proposed approach with global variance (GV), modulation spectrum (MS) and recurrent neural network (RNN) based postfilters, which serve a similar purpose. The subjective test results show that the proposed approach significantly outperforms these conventional approaches in terms of quality and similarity.
KW - Denoising
KW - Gaussian noise
KW - Recurrent neural network
KW - Residual error
KW - Voice conversion
UR - http://www.scopus.com/inward/record.url?scp=85039166058&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2017-694
DO - 10.21437/Interspeech.2017-694
M3 - 会议文章
AN - SCOPUS:85039166058
SN - 2308-457X
VL - 2017-August
SP - 3379
EP - 3383
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017
Y2 - 20 August 2017 through 24 August 2017
ER -