Conversational End-to-End TTS for Voice Agents

Haohan Guo, Shaofei Zhang, Frank K. Soong, Lei He, Lei Xie

科研成果: 书/报告/会议事项章节会议稿件同行评审

51 引用 (Scopus)

摘要

End-to-end neural TTS has achieved excellent performance on reading style speech synthesis. However, it is still a challenge to build a high-quality conversational TTS due to the limitations of corpus and modeling capability. This study aims at building a conversational TTS for a voice agent under sequence to sequence modeling framework. We firstly construct a spontaneous conversational speech corpus well designed for the voice agent with a new recording scheme ensuring both recording quality and conversational speaking style. Secondly, we propose a conversation context-aware end-to-end TTS approach that employs an auxiliary encoder and a conversational context encoder to specifically reinforce the information about the current utterance and its context in a conversation as well. Experimental results show that the proposed approach produces more natural prosody in accordance with the conversational context, with significant preference gains at both utterance-level and conversation-level. Moreover, we find that the model has the ability to express some spontaneous behaviors like fillers and repeated words, which makes the conversational speaking style more realistic.

源语言英语
主期刊名2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
403-409
页数7
ISBN(电子版)9781728170664
DOI
出版状态已出版 - 19 1月 2021
活动2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Virtual, Shenzhen, 中国
期限: 19 1月 202122 1月 2021

出版系列

姓名2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings

会议

会议2021 IEEE Spoken Language Technology Workshop, SLT 2021
国家/地区中国
Virtual, Shenzhen
时期19/01/2122/01/21

指纹

探究 'Conversational End-to-End TTS for Voice Agents' 的科研主题。它们共同构成独一无二的指纹。

引用此