E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models

Hongfei Xue; Yuhao Liang; Bingshen Mu; Shiliang Zhang; Mengzhe Chen; Qian Chen; Lei Xie

doi:10.1109/ISCSLP63861.2024.10800447

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models

Hongfei Xue, Yuhao Liang, Bingshen Mu, Shiliang Zhang, Mengzhe Chen, Qian Chen, Lei Xie

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Scopus citations

Abstract

This study focuses on emotion-sensitive spoken dialogue in human-machine speech interaction. With the advancement of Large Language Models (LLMs), dialogue systems can handle multimodal data, including audio. Recent models have enhanced the understanding of complex audio signals through the integration of various audio events. However, they are unable to generate appropriate responses based on emotional speech. To address this, we introduce the Emotional chat Model (E-chat), a novel spoken dialogue system capable of comprehending and responding to emotions conveyed from speech. This model leverages an emotion embedding extracted by a speech encoder, combined with LLMs, enabling it to respond according to different emotional contexts. Additionally, we introduce the E-chat200 dataset, designed explicitly for emotion-sensitive spoken dialogue. In various evaluation metrics, E-chat consistently outperforms baseline model, demonstrating its potential in emotional comprehension and human-machine interaction.

Original language	English
Title of host publication	2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024
Editors	Yanmin Qian, Qin Jin, Zhijian Ou, Zhenhua Ling, Zhiyong Wu, Ya Li, Lei Xie, Jianhua Tao
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	586-590
Number of pages	5
ISBN (Electronic)	9798331516826
DOIs	https://doi.org/10.1109/ISCSLP63861.2024.10800447
State	Published - 2024
Event	14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024 - Beijing, China Duration: 7 Nov 2024 → 10 Nov 2024

Publication series

Name	2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024

Conference

Conference	14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024
Country/Territory	China
City	Beijing
Period	7/11/24 → 10/11/24

Keywords

emotional speech comprehension
large language model
spoken dialogue systems

Access to Document

10.1109/ISCSLP63861.2024.10800447

Cite this

Xue, H., Liang, Y., Mu, B., Zhang, S., Chen, M., Chen, Q., & Xie, L. (2024). E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models. In Y. Qian, Q. Jin, Z. Ou, Z. Ling, Z. Wu, Y. Li, L. Xie, & J. Tao (Eds.), 2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024 (pp. 586-590). (2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISCSLP63861.2024.10800447

Xue, Hongfei ; Liang, Yuhao ; Mu, Bingshen et al. / E-chat : Emotion-sensitive Spoken Dialogue System with Large Language Models. 2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024. editor / Yanmin Qian ; Qin Jin ; Zhijian Ou ; Zhenhua Ling ; Zhiyong Wu ; Ya Li ; Lei Xie ; Jianhua Tao. Institute of Electrical and Electronics Engineers Inc., 2024. pp. 586-590 (2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024).

@inproceedings{9c30b9bb26ea4439bcbcfcdf37ffc4d1,

title = "E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models",

abstract = "This study focuses on emotion-sensitive spoken dialogue in human-machine speech interaction. With the advancement of Large Language Models (LLMs), dialogue systems can handle multimodal data, including audio. Recent models have enhanced the understanding of complex audio signals through the integration of various audio events. However, they are unable to generate appropriate responses based on emotional speech. To address this, we introduce the Emotional chat Model (E-chat), a novel spoken dialogue system capable of comprehending and responding to emotions conveyed from speech. This model leverages an emotion embedding extracted by a speech encoder, combined with LLMs, enabling it to respond according to different emotional contexts. Additionally, we introduce the E-chat200 dataset, designed explicitly for emotion-sensitive spoken dialogue. In various evaluation metrics, E-chat consistently outperforms baseline model, demonstrating its potential in emotional comprehension and human-machine interaction.",

keywords = "emotional speech comprehension, large language model, spoken dialogue systems",

author = "Hongfei Xue and Yuhao Liang and Bingshen Mu and Shiliang Zhang and Mengzhe Chen and Qian Chen and Lei Xie",

note = "Publisher Copyright: {\textcopyright}2024 IEEE.; 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024 ; Conference date: 07-11-2024 Through 10-11-2024",

year = "2024",

doi = "10.1109/ISCSLP63861.2024.10800447",

language = "英语",

series = "2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "586--590",

editor = "Yanmin Qian and Qin Jin and Zhijian Ou and Zhenhua Ling and Zhiyong Wu and Ya Li and Lei Xie and Jianhua Tao",

booktitle = "2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024",

}

Xue, H, Liang, Y, Mu, B, Zhang, S, Chen, M, Chen, Q & Xie, L 2024, E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models. in Y Qian, Q Jin, Z Ou, Z Ling, Z Wu, Y Li, L Xie & J Tao (eds), 2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024. 2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024, Institute of Electrical and Electronics Engineers Inc., pp. 586-590, 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024, Beijing, China, 7/11/24. https://doi.org/10.1109/ISCSLP63861.2024.10800447

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models. / Xue, Hongfei; Liang, Yuhao; Mu, Bingshen et al.
2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024. ed. / Yanmin Qian; Qin Jin; Zhijian Ou; Zhenhua Ling; Zhiyong Wu; Ya Li; Lei Xie; Jianhua Tao. Institute of Electrical and Electronics Engineers Inc., 2024. p. 586-590 (2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - E-chat

T2 - 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024

AU - Xue, Hongfei

AU - Liang, Yuhao

AU - Mu, Bingshen

AU - Zhang, Shiliang

AU - Chen, Mengzhe

AU - Chen, Qian

AU - Xie, Lei

PY - 2024

Y1 - 2024

N2 - This study focuses on emotion-sensitive spoken dialogue in human-machine speech interaction. With the advancement of Large Language Models (LLMs), dialogue systems can handle multimodal data, including audio. Recent models have enhanced the understanding of complex audio signals through the integration of various audio events. However, they are unable to generate appropriate responses based on emotional speech. To address this, we introduce the Emotional chat Model (E-chat), a novel spoken dialogue system capable of comprehending and responding to emotions conveyed from speech. This model leverages an emotion embedding extracted by a speech encoder, combined with LLMs, enabling it to respond according to different emotional contexts. Additionally, we introduce the E-chat200 dataset, designed explicitly for emotion-sensitive spoken dialogue. In various evaluation metrics, E-chat consistently outperforms baseline model, demonstrating its potential in emotional comprehension and human-machine interaction.

AB - This study focuses on emotion-sensitive spoken dialogue in human-machine speech interaction. With the advancement of Large Language Models (LLMs), dialogue systems can handle multimodal data, including audio. Recent models have enhanced the understanding of complex audio signals through the integration of various audio events. However, they are unable to generate appropriate responses based on emotional speech. To address this, we introduce the Emotional chat Model (E-chat), a novel spoken dialogue system capable of comprehending and responding to emotions conveyed from speech. This model leverages an emotion embedding extracted by a speech encoder, combined with LLMs, enabling it to respond according to different emotional contexts. Additionally, we introduce the E-chat200 dataset, designed explicitly for emotion-sensitive spoken dialogue. In various evaluation metrics, E-chat consistently outperforms baseline model, demonstrating its potential in emotional comprehension and human-machine interaction.

KW - emotional speech comprehension

KW - large language model

KW - spoken dialogue systems

UR - http://www.scopus.com/inward/record.url?scp=85216396903&partnerID=8YFLogxK

U2 - 10.1109/ISCSLP63861.2024.10800447

DO - 10.1109/ISCSLP63861.2024.10800447

M3 - 会议稿件

AN - SCOPUS:85216396903

T3 - 2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024

SP - 586

EP - 590

BT - 2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024

A2 - Qian, Yanmin

A2 - Jin, Qin

A2 - Ou, Zhijian

A2 - Ling, Zhenhua

A2 - Wu, Zhiyong

A2 - Li, Ya

A2 - Xie, Lei

A2 - Tao, Jianhua

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 7 November 2024 through 10 November 2024

ER -

Xue H, Liang Y, Mu B, Zhang S, Chen M, Chen Q et al. E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models. In Qian Y, Jin Q, Ou Z, Ling Z, Wu Z, Li Y, Xie L, Tao J, editors, 2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024. Institute of Electrical and Electronics Engineers Inc. 2024. p. 586-590. (2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024). doi: 10.1109/ISCSLP63861.2024.10800447

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this