Skip to main navigation Skip to search Skip to main content

A comparison of expressive speech synthesis approaches based on neural network

  • Liumeng Xue
  • , Xiaolian Zhu
  • , Xiaochun An
  • , Lei Xie
  • Northwestern Polytechnical University Xian
  • Hebei University of Economics and Business

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Adaptability and controllability in changing speaking styles and speaker characteristics are the advantages of deep neural networks (DNNs) based statistical parametric speech synthesis (SPSS). This paper presents a comprehensive study on the use of DNNs for expressive speech synthesis with a small set of emotional speech data. Specifically, we study three typical model adaptation approaches: (1) retraining a neural model by emotion-specific data (retrain), (2) augmenting the network input using emotion-specific codes (code) and (3) using emotion-dependent output layers with shared hidden layers (multi-head). Long-short term memory (LSTM) networks are used as the acoustic models. Objective and subjective evaluations have demonstrated that the multi-head approach consistently outperforms the other two approaches with more natural emotion delivered in the synthesized speech.

Original languageEnglish
Title of host publicationASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018
PublisherAssociation for Computing Machinery, Inc
Pages15-20
Number of pages6
ISBN (Electronic)9781450359856
DOIs
StatePublished - 19 Oct 2018
EventJoint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data Workshop, ASMMC-MMAC 2018 - Seoul, Korea, Republic of
Duration: 26 Oct 2018 → …

Publication series

NameASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018

Conference

ConferenceJoint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data Workshop, ASMMC-MMAC 2018
Country/TerritoryKorea, Republic of
CitySeoul
Period26/10/18 → …

Keywords

  • Code
  • Expressive speech synthesis
  • Multi-head network
  • Neural networks
  • Retrain
  • Statistical parametric speech synthesis
  • Text-to-speech

Fingerprint

Dive into the research topics of 'A comparison of expressive speech synthesis approaches based on neural network'. Together they form a unique fingerprint.

Cite this