Multi-granularity Semantic and Acoustic Stress Prediction for Expressive TTS

Wenjiang Chi, Xiaoqin Feng, Liumeng Xue, Yunlin Chen, Lei Xie, Zhifei Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Stress, as the perceptual prominence within sentences, plays a key role in expressive text-to-speech (TTS). It can be either the semantic focus in text or the acoustic prominence in speech. However, stress labels are always annotated by listening to the speech, lacking semantic information in the corresponding text, which may degrade the accuracy of stress prediction and the expressivity of TTS. This paper proposes a multi-granularity stress prediction method for expressive TTS. Specifically, we first build Chinese Mandarin datasets with both coarse-grained semantic stress and fine-grained acoustic stress. Then, the proposed model progressively predicts semantic stress and acoustic stress. Finally, a TTS model is adopted to synthesize speech with the predicted stress. Experimental results on the proposed model and synthesized speech show that our proposed model achieves good accuracy in stress prediction and improves the expressiveness and naturalness of the synthesized speech.

Original languageEnglish
Title of host publication2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2409-2415
Number of pages7
ISBN (Electronic)9798350300673
DOIs
StatePublished - 2023
Externally publishedYes
Event2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023 - Taipei, Taiwan, Province of China
Duration: 31 Oct 20233 Nov 2023

Publication series

Name2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023

Conference

Conference2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
Country/TerritoryTaiwan, Province of China
CityTaipei
Period31/10/233/11/23

Fingerprint

Dive into the research topics of 'Multi-granularity Semantic and Acoustic Stress Prediction for Expressive TTS'. Together they form a unique fingerprint.

Cite this