TY - GEN
T1 - E-chat
T2 - 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024
AU - Xue, Hongfei
AU - Liang, Yuhao
AU - Mu, Bingshen
AU - Zhang, Shiliang
AU - Chen, Mengzhe
AU - Chen, Qian
AU - Xie, Lei
N1 - Publisher Copyright:
©2024 IEEE.
PY - 2024
Y1 - 2024
N2 - This study focuses on emotion-sensitive spoken dialogue in human-machine speech interaction. With the advancement of Large Language Models (LLMs), dialogue systems can handle multimodal data, including audio. Recent models have enhanced the understanding of complex audio signals through the integration of various audio events. However, they are unable to generate appropriate responses based on emotional speech. To address this, we introduce the Emotional chat Model (E-chat), a novel spoken dialogue system capable of comprehending and responding to emotions conveyed from speech. This model leverages an emotion embedding extracted by a speech encoder, combined with LLMs, enabling it to respond according to different emotional contexts. Additionally, we introduce the E-chat200 dataset, designed explicitly for emotion-sensitive spoken dialogue. In various evaluation metrics, E-chat consistently outperforms baseline model, demonstrating its potential in emotional comprehension and human-machine interaction.
AB - This study focuses on emotion-sensitive spoken dialogue in human-machine speech interaction. With the advancement of Large Language Models (LLMs), dialogue systems can handle multimodal data, including audio. Recent models have enhanced the understanding of complex audio signals through the integration of various audio events. However, they are unable to generate appropriate responses based on emotional speech. To address this, we introduce the Emotional chat Model (E-chat), a novel spoken dialogue system capable of comprehending and responding to emotions conveyed from speech. This model leverages an emotion embedding extracted by a speech encoder, combined with LLMs, enabling it to respond according to different emotional contexts. Additionally, we introduce the E-chat200 dataset, designed explicitly for emotion-sensitive spoken dialogue. In various evaluation metrics, E-chat consistently outperforms baseline model, demonstrating its potential in emotional comprehension and human-machine interaction.
KW - emotional speech comprehension
KW - large language model
KW - spoken dialogue systems
UR - http://www.scopus.com/inward/record.url?scp=85216396903&partnerID=8YFLogxK
U2 - 10.1109/ISCSLP63861.2024.10800447
DO - 10.1109/ISCSLP63861.2024.10800447
M3 - 会议稿件
AN - SCOPUS:85216396903
T3 - 2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024
SP - 586
EP - 590
BT - 2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024
A2 - Qian, Yanmin
A2 - Jin, Qin
A2 - Ou, Zhijian
A2 - Ling, Zhenhua
A2 - Wu, Zhiyong
A2 - Li, Ya
A2 - Xie, Lei
A2 - Tao, Jianhua
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 7 November 2024 through 10 November 2024
ER -