Controlling Emotion Strength with Relative Attribute for End-To-End Speech Synthesis

Xiaolian Zhu, Shan Yang, Geng Yang, Lei Xie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

46 Scopus citations

Abstract

Recently, attention-based end-To-end speech synthesis has achieved superior performance compared to traditional speech synthesis models, and several approaches like global style tokens are proposed to explore the style controllability of the end-To-end model. Although the existing methods show good performance in style disentanglement and transfer, it is still unable to control the explicit emotion of generated speech. In this paper, we mainly focus on the subtle control of expressive speech synthesis, where the emotion category and strength can be easily controlled with a discrete emotional vector and a continuous simple scalar, respectively. The continuous strength controller is learned by a ranking function according to the relative attribute measured on an emotion dataset. Our method automatically learns the relationship between low-level acoustic features and high-level subtle emotion strength. Experiments show that our method can effectively improve the controllability for an expressive end-To-end model.

Original languageEnglish
Title of host publication2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages192-199
Number of pages8
ISBN (Electronic)9781728103068
DOIs
StatePublished - Dec 2019
Event2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Singapore, Singapore
Duration: 15 Dec 201918 Dec 2019

Publication series

Name2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings

Conference

Conference2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019
Country/TerritorySingapore
CitySingapore
Period15/12/1918/12/19

Keywords

  • Emotion strength
  • end-To-end
  • relative attributes
  • speech synthesis
  • text-To-speech

Fingerprint

Dive into the research topics of 'Controlling Emotion Strength with Relative Attribute for End-To-End Speech Synthesis'. Together they form a unique fingerprint.

Cite this