跳到主要导航 跳到搜索 跳到主要内容

XEmoRAG: Cross-Lingual Emotion Transfer with Controllable Intensity Using Retrieval-Augmented Generation

  • Tianlun Zuo
  • , Jingbin Hu
  • , Yuke Li
  • , Xinfa Zhu
  • , Hai Li
  • , Ying Yan
  • , Junhui Liu
  • , Danming Xie
  • , Lei Xie
  • Northwestern Polytechnical University Xian
  • IQIYI Inc

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Zero-shot emotion transfer in cross-lingual speech synthesis refers to generating speech in a target language, where the emotion is expressed based on reference speech from a different source language. However, this task remains challenging due to the scarcity of parallel multilingual emotional corpora, the presence of foreign accent artifacts, and the difficulty of separating emotion from language-specific prosodic features. In this paper, we propose XEmoRAG, a novel framework to enable zero-shot emotion transfer from Chinese to Thai using a large language model (LLM)-based model, without relying on parallel emotional data. XEmoRAG extracts language-agnostic emotional embeddings from Chinese speech and retrieves emotionally matched Thai utterances from a curated emotional database, enabling controllable emotion transfer without explicit emotion labels. Additionally, a flow-matching alignment module minimizes pitch and duration mismatches, ensuring natural prosody. It also blends Chinese timbre into the Thai synthesis, enhancing rhythmic accuracy and emotional expression, while preserving speaker characteristics and emotional consistency. Experimental results show that XEmoRAG synthesizes expressive and natural Thai speech using only Chinese reference audio, without requiring explicit emotion labels. These results highlight XEmoRAG's capability to achieve flexible and low-resource emotional transfer across languages. Our demo is available at https://tlzuo-lesley.github.io/Demo-page/.

源语言英语
主期刊名ASRU 2025 - 2025 IEEE Automatic Speech Recognition and Understanding Workshop
出版商Institute of Electrical and Electronics Engineers Inc.
ISBN(电子版)9798331544263
DOI
出版状态已出版 - 2025
活动2025 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2025 - Honolulu, 美国
期限: 6 12月 202510 12月 2025

出版系列

姓名ASRU 2025 - 2025 IEEE Automatic Speech Recognition and Understanding Workshop

会议

会议2025 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2025
国家/地区美国
Honolulu
时期6/12/2510/12/25

指纹

探究 'XEmoRAG: Cross-Lingual Emotion Transfer with Controllable Intensity Using Retrieval-Augmented Generation' 的科研主题。它们共同构成独一无二的指纹。

引用此