TY - GEN
T1 - Controlling Expressivity using Input Codes in Neural Network based TTS
AU - Zhu, Xiaolian
AU - Xie, Lei
AU - Chen, Xiao
AU - Lou, Xiaoyan
AU - Zhu, Xuan
AU - Tan, Xingjun
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/9/21
Y1 - 2018/9/21
N2 - This paper presents a study on the use of input codes in the neural network acoustic modeling for expressive TTS. Specifically, we use different kinds of input codes, augmented with the linguistic features, as the input of a BLSTM-based acoustic model, to control the expressivity of the synthesized speech. The input codes, in one-hot representation, include dialogue code, sentiment code and sentence position code. The dialogue code indicates whether the text is a dialogue or narration in an audiobook story. The sentiment code is obtained from a sentiment analysis tool, which labels each sentence as positive, negative and neutral. The sentence position code indicates the position of the sentence in the paragraph. We believe these codes are highly related to the expressiveness of the audiobook speech. Experiments on the data from the Blizzard Challenge 2017 demonstrate the effectiveness of the use of input codes in the neural network approach for expressive TTS.
AB - This paper presents a study on the use of input codes in the neural network acoustic modeling for expressive TTS. Specifically, we use different kinds of input codes, augmented with the linguistic features, as the input of a BLSTM-based acoustic model, to control the expressivity of the synthesized speech. The input codes, in one-hot representation, include dialogue code, sentiment code and sentence position code. The dialogue code indicates whether the text is a dialogue or narration in an audiobook story. The sentiment code is obtained from a sentiment analysis tool, which labels each sentence as positive, negative and neutral. The sentence position code indicates the position of the sentence in the paragraph. We believe these codes are highly related to the expressiveness of the audiobook speech. Experiments on the data from the Blizzard Challenge 2017 demonstrate the effectiveness of the use of input codes in the neural network approach for expressive TTS.
KW - BLSTM
KW - Neural network
KW - Speech synthesis
KW - Text-to-speech
UR - http://www.scopus.com/inward/record.url?scp=85055466825&partnerID=8YFLogxK
U2 - 10.1109/ACIIAsia.2018.8470327
DO - 10.1109/ACIIAsia.2018.8470327
M3 - 会议稿件
AN - SCOPUS:85055466825
T3 - 2018 1st Asian Conference on Affective Computing and Intelligent Interaction, ACII Asia 2018
BT - 2018 1st Asian Conference on Affective Computing and Intelligent Interaction, ACII Asia 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 1st Asian Conference on Affective Computing and Intelligent Interaction, ACII Asia 2018
Y2 - 20 May 2018 through 22 May 2018
ER -