CONVERSATIONAL SPEECH RECOGNITION BY LEARNING CONVERSATION-LEVEL CHARACTERISTICS

Kun Wei; Yike Zhang; Sining Sun; Lei Xie; Long Ma

doi:10.1109/ICASSP43922.2022.9746884

CONVERSATIONAL SPEECH RECOGNITION BY LEARNING CONVERSATION-LEVEL CHARACTERISTICS

Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

10 引用（Scopus）

摘要

Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers. Unlike sentence-level ASR, conversational ASR can naturally take advantages from specific characteristics of conversation, such as role preference and topical coherence. This paper proposes a conversational ASR model which explicitly learns conversation-level characteristics under the prevalent end-to-end neural framework. The highlights of the proposed model are twofold. First, a latent variational module (LVM) is attached to a conformer-based encoder-decoder ASR backbone to learn role preference and topical coherence. Second, a topic model is specifically adopted to bias the outputs of the decoder to words in the predicted topics. Experiments on two Mandarin conversational ASR tasks show that the proposed model achieves a maximum 12% relative character error rate (CER) reduction.

源语言	英语
主期刊名	2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
出版商	Institute of Electrical and Electronics Engineers Inc.
页	6752-6756
页数	5
ISBN（电子版）	9781665405409
DOI	https://doi.org/10.1109/ICASSP43922.2022.9746884
出版状态	已出版 - 2022
活动	2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022 - Hybrid, 新加坡期限: 22 5月 2022 → 27 5月 2022

出版系列

姓名	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
卷	2022-May
ISSN（印刷版）	1520-6149

会议

会议	2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022
国家/地区	新加坡
市	Hybrid
时期	22/05/22 → 27/05/22

访问文件

10.1109/ICASSP43922.2022.9746884

其它文件与链接

链接到 Scopus 的出版物

引用此

Wei, K., Zhang, Y., Sun, S., Xie, L., & Ma, L. (2022). CONVERSATIONAL SPEECH RECOGNITION BY LEARNING CONVERSATION-LEVEL CHARACTERISTICS. 在 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings (页码 6752-6756). (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 卷 2022-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP43922.2022.9746884

Wei, Kun ; Zhang, Yike ; Sun, Sining 等. / CONVERSATIONAL SPEECH RECOGNITION BY LEARNING CONVERSATION-LEVEL CHARACTERISTICS. 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. 页码 6752-6756 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{9c2c5e1e46fc49b7930fbd95b0477810,

title = "CONVERSATIONAL SPEECH RECOGNITION BY LEARNING CONVERSATION-LEVEL CHARACTERISTICS",

abstract = "Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers. Unlike sentence-level ASR, conversational ASR can naturally take advantages from specific characteristics of conversation, such as role preference and topical coherence. This paper proposes a conversational ASR model which explicitly learns conversation-level characteristics under the prevalent end-to-end neural framework. The highlights of the proposed model are twofold. First, a latent variational module (LVM) is attached to a conformer-based encoder-decoder ASR backbone to learn role preference and topical coherence. Second, a topic model is specifically adopted to bias the outputs of the decoder to words in the predicted topics. Experiments on two Mandarin conversational ASR tasks show that the proposed model achieves a maximum 12% relative character error rate (CER) reduction.",

keywords = "Conversational ASR, end-to-end ASR, latent variational module, topic-realted rescoring",

author = "Kun Wei and Yike Zhang and Sining Sun and Lei Xie and Long Ma",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE; 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022 ; Conference date: 22-05-2022 Through 27-05-2022",

year = "2022",

doi = "10.1109/ICASSP43922.2022.9746884",

language = "英语",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "6752--6756",

booktitle = "2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings",

}

Wei, K, Zhang, Y, Sun, S, Xie, L & Ma, L 2022, CONVERSATIONAL SPEECH RECOGNITION BY LEARNING CONVERSATION-LEVEL CHARACTERISTICS. 在 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 卷 2022-May, Institute of Electrical and Electronics Engineers Inc., 页码 6752-6756, 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Hybrid, 新加坡, 22/05/22. https://doi.org/10.1109/ICASSP43922.2022.9746884

CONVERSATIONAL SPEECH RECOGNITION BY LEARNING CONVERSATION-LEVEL CHARACTERISTICS. / Wei, Kun; Zhang, Yike; Sun, Sining 等.
2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. 页码 6752-6756 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 卷 2022-May).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - CONVERSATIONAL SPEECH RECOGNITION BY LEARNING CONVERSATION-LEVEL CHARACTERISTICS

AU - Wei, Kun

AU - Zhang, Yike

AU - Sun, Sining

AU - Xie, Lei

AU - Ma, Long

PY - 2022

Y1 - 2022

N2 - Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers. Unlike sentence-level ASR, conversational ASR can naturally take advantages from specific characteristics of conversation, such as role preference and topical coherence. This paper proposes a conversational ASR model which explicitly learns conversation-level characteristics under the prevalent end-to-end neural framework. The highlights of the proposed model are twofold. First, a latent variational module (LVM) is attached to a conformer-based encoder-decoder ASR backbone to learn role preference and topical coherence. Second, a topic model is specifically adopted to bias the outputs of the decoder to words in the predicted topics. Experiments on two Mandarin conversational ASR tasks show that the proposed model achieves a maximum 12% relative character error rate (CER) reduction.

AB - Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers. Unlike sentence-level ASR, conversational ASR can naturally take advantages from specific characteristics of conversation, such as role preference and topical coherence. This paper proposes a conversational ASR model which explicitly learns conversation-level characteristics under the prevalent end-to-end neural framework. The highlights of the proposed model are twofold. First, a latent variational module (LVM) is attached to a conformer-based encoder-decoder ASR backbone to learn role preference and topical coherence. Second, a topic model is specifically adopted to bias the outputs of the decoder to words in the predicted topics. Experiments on two Mandarin conversational ASR tasks show that the proposed model achieves a maximum 12% relative character error rate (CER) reduction.

KW - Conversational ASR

KW - end-to-end ASR

KW - latent variational module

KW - topic-realted rescoring

UR - http://www.scopus.com/inward/record.url?scp=85131230303&partnerID=8YFLogxK

U2 - 10.1109/ICASSP43922.2022.9746884

DO - 10.1109/ICASSP43922.2022.9746884

M3 - 会议稿件

AN - SCOPUS:85131230303

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 6752

EP - 6756

BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022

Y2 - 22 May 2022 through 27 May 2022

ER -

Wei K, Zhang Y, Sun S, Xie L, Ma L. CONVERSATIONAL SPEECH RECOGNITION BY LEARNING CONVERSATION-LEVEL CHARACTERISTICS. 在 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2022. 页码 6752-6756. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP43922.2022.9746884

CONVERSATIONAL SPEECH RECOGNITION BY LEARNING CONVERSATION-LEVEL CHARACTERISTICS

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此