Multi-granularity Semantic and Acoustic Stress Prediction for Expressive TTS

Wenjiang Chi, Xiaoqin Feng, Liumeng Xue, Yunlin Chen, Lei Xie, Zhifei Li

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Stress, as the perceptual prominence within sentences, plays a key role in expressive text-to-speech (TTS). It can be either the semantic focus in text or the acoustic prominence in speech. However, stress labels are always annotated by listening to the speech, lacking semantic information in the corresponding text, which may degrade the accuracy of stress prediction and the expressivity of TTS. This paper proposes a multi-granularity stress prediction method for expressive TTS. Specifically, we first build Chinese Mandarin datasets with both coarse-grained semantic stress and fine-grained acoustic stress. Then, the proposed model progressively predicts semantic stress and acoustic stress. Finally, a TTS model is adopted to synthesize speech with the predicted stress. Experimental results on the proposed model and synthesized speech show that our proposed model achieves good accuracy in stress prediction and improves the expressiveness and naturalness of the synthesized speech.

源语言英语
主期刊名2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
出版商Institute of Electrical and Electronics Engineers Inc.
2409-2415
页数7
ISBN(电子版)9798350300673
DOI
出版状态已出版 - 2023
已对外发布
活动2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023 - Taipei, 中国台湾
期限: 31 10月 20233 11月 2023

出版系列

姓名2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023

会议

会议2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
国家/地区中国台湾
Taipei
时期31/10/233/11/23

指纹

探究 'Multi-granularity Semantic and Acoustic Stress Prediction for Expressive TTS' 的科研主题。它们共同构成独一无二的指纹。

引用此