跳到主要导航 跳到搜索 跳到主要内容

Building a mixed-lingual neural TTS system with only monolingual data

  • Liumeng Xue
  • , Wei Song
  • , Guanghui Xu
  • , Lei Xie
  • , Zhizheng Wu
  • Northwestern Polytechnical University Xian
  • JD.com, Inc.

科研成果: 期刊稿件会议文章同行评审

15 引用 (Scopus)

摘要

When deploying a Chinese neural Text-to-Speech (TTS) system, one of the challenges is to synthesize Chinese utterances with English phrases or words embedded. This paper looks into the problem in the encoder-decoder framework when only monolingual data from a target speaker is available. Specifically, we view the problem from two aspects: speaker consistency within an utterance and naturalness. We start the investigation with an average voice model which is built from multispeaker monolingual data, i.e., Mandarin and English data. On the basis of that, we look into speaker embedding for speaker consistency within an utterance and phoneme embedding for naturalness and intelligibility, and study the choice of data for model training. We report the findings and discuss the challenges to build a mixed-lingual TTS system with only monolingual data.

指纹

探究 'Building a mixed-lingual neural TTS system with only monolingual data' 的科研主题。它们共同构成独一无二的指纹。

引用此