TY - GEN
T1 - CONVERSATIONAL SPEECH RECOGNITION BY LEARNING CONVERSATION-LEVEL CHARACTERISTICS
AU - Wei, Kun
AU - Zhang, Yike
AU - Sun, Sining
AU - Xie, Lei
AU - Ma, Long
N1 - Publisher Copyright:
© 2022 IEEE
PY - 2022
Y1 - 2022
N2 - Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers. Unlike sentence-level ASR, conversational ASR can naturally take advantages from specific characteristics of conversation, such as role preference and topical coherence. This paper proposes a conversational ASR model which explicitly learns conversation-level characteristics under the prevalent end-to-end neural framework. The highlights of the proposed model are twofold. First, a latent variational module (LVM) is attached to a conformer-based encoder-decoder ASR backbone to learn role preference and topical coherence. Second, a topic model is specifically adopted to bias the outputs of the decoder to words in the predicted topics. Experiments on two Mandarin conversational ASR tasks show that the proposed model achieves a maximum 12% relative character error rate (CER) reduction.
AB - Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers. Unlike sentence-level ASR, conversational ASR can naturally take advantages from specific characteristics of conversation, such as role preference and topical coherence. This paper proposes a conversational ASR model which explicitly learns conversation-level characteristics under the prevalent end-to-end neural framework. The highlights of the proposed model are twofold. First, a latent variational module (LVM) is attached to a conformer-based encoder-decoder ASR backbone to learn role preference and topical coherence. Second, a topic model is specifically adopted to bias the outputs of the decoder to words in the predicted topics. Experiments on two Mandarin conversational ASR tasks show that the proposed model achieves a maximum 12% relative character error rate (CER) reduction.
KW - Conversational ASR
KW - end-to-end ASR
KW - latent variational module
KW - topic-realted rescoring
UR - http://www.scopus.com/inward/record.url?scp=85131230303&partnerID=8YFLogxK
U2 - 10.1109/ICASSP43922.2022.9746884
DO - 10.1109/ICASSP43922.2022.9746884
M3 - 会议稿件
AN - SCOPUS:85131230303
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 6752
EP - 6756
BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022
Y2 - 22 May 2022 through 27 May 2022
ER -