TY - JOUR
T1 - Building a mixed-lingual neural TTS system with only monolingual data
AU - Xue, Liumeng
AU - Song, Wei
AU - Xu, Guanghui
AU - Xie, Lei
AU - Wu, Zhizheng
N1 - Publisher Copyright:
Copyright © 2019 ISCA
PY - 2019
Y1 - 2019
N2 - When deploying a Chinese neural Text-to-Speech (TTS) system, one of the challenges is to synthesize Chinese utterances with English phrases or words embedded. This paper looks into the problem in the encoder-decoder framework when only monolingual data from a target speaker is available. Specifically, we view the problem from two aspects: speaker consistency within an utterance and naturalness. We start the investigation with an average voice model which is built from multispeaker monolingual data, i.e., Mandarin and English data. On the basis of that, we look into speaker embedding for speaker consistency within an utterance and phoneme embedding for naturalness and intelligibility, and study the choice of data for model training. We report the findings and discuss the challenges to build a mixed-lingual TTS system with only monolingual data.
AB - When deploying a Chinese neural Text-to-Speech (TTS) system, one of the challenges is to synthesize Chinese utterances with English phrases or words embedded. This paper looks into the problem in the encoder-decoder framework when only monolingual data from a target speaker is available. Specifically, we view the problem from two aspects: speaker consistency within an utterance and naturalness. We start the investigation with an average voice model which is built from multispeaker monolingual data, i.e., Mandarin and English data. On the basis of that, we look into speaker embedding for speaker consistency within an utterance and phoneme embedding for naturalness and intelligibility, and study the choice of data for model training. We report the findings and discuss the challenges to build a mixed-lingual TTS system with only monolingual data.
KW - Encoder-decoder
KW - Mixed-lingual
KW - Speech synthesis
UR - http://www.scopus.com/inward/record.url?scp=85094765018&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2019-3191
DO - 10.21437/Interspeech.2019-3191
M3 - 会议文章
AN - SCOPUS:85094765018
SN - 2308-457X
VL - 2019-September
SP - 2060
EP - 2064
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019
Y2 - 15 September 2019 through 19 September 2019
ER -