Controllable context-aware conversational speech synthesis

  • Jian Cong
  • , Shan Yang
  • , Na Hu
  • , Guangzhi Li
  • , Lei Xie
  • , Dan Su

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Scopus citations

Abstract

In spoken conversations, spontaneous behaviors like filled pause and prolongations always happen. Conversational partner tends to align features of their speech with their interlocutor which is known as entrainment. To produce human-like conversations, we propose a unified controllable spontaneous conversational speech synthesis framework to model the above two phenomena. Specifically, we use explicit labels to represent two typical spontaneous behaviors filled-pause and prolongation in the acoustic model and develop a neural network based predictor to predict the occurrences of the two behaviors from text. We subsequently develop an algorithm based on the predictor to control the occurrence frequency of the behaviors, making the synthesized speech vary from less disfluent to more disfluent. To model the speech entrainment at acoustic level, we utilize a context acoustic encoder to extract a global style embedding from the previous speech conditioning on the synthesizing of current speech. Furthermore, since the current and previous utterances belong to the different speakers in a conversation, we add a domain adversarial training module to eliminate the speaker-related information in the acoustic encoder while maintaining the style-related information. Experiments show that our proposed approach can synthesize realistic conversations and control the occurrences of the spontaneous behaviors naturally.

Original languageEnglish
Title of host publication22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
PublisherInternational Speech Communication Association
Pages3426-3430
Number of pages5
ISBN (Electronic)9781713836902
DOIs
StatePublished - 2021
Event22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic
Duration: 30 Aug 20213 Sep 2021

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume5
ISSN (Print)2308-457X
ISSN (Electronic)2958-1796

Conference

Conference22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Country/TerritoryCzech Republic
CityBrno
Period30/08/213/09/21

Keywords

  • Conversational speech
  • Speech synthesis
  • Spontaneous speech

Fingerprint

Dive into the research topics of 'Controllable context-aware conversational speech synthesis'. Together they form a unique fingerprint.

Cite this