Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network

Kaixun Huang, Ao Zhang, Zhanheng Yang, Pengcheng Guo, Bingshen Mu, Tianyi Xu, Lei Xie

科研成果: 期刊稿件会议文章同行评审

15 引用 (Scopus)

摘要

Contextual information plays a crucial role in speech recognition technologies and incorporating it into the end-to-end speech recognition models has drawn immense interest recently. However, previous deep bias methods lacked explicit supervision for bias tasks. In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method. This network predicts context phrases in utterances using contextual embeddings and calculates bias loss to assist in the training of the contextualized model. Our method achieved a significant word error rate (WER) reduction across various end-to-end speech recognition models. Experiments on the LibriSpeech corpus show that our proposed model obtains a 12.1% relative WER improvement over the baseline model, and the WER of the context phrases decreases relatively by 40.5%. Moreover, by applying a context phrase filtering strategy, we also effectively eliminate the WER degradation when using a larger biasing list.

源语言英语
页(从-至)4933-4937
页数5
期刊Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2023-August
DOI
出版状态已出版 - 2023
活动24th International Speech Communication Association, Interspeech 2023 - Dublin, 爱尔兰
期限: 20 8月 202324 8月 2023

指纹

探究 'Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network' 的科研主题。它们共同构成独一无二的指纹。

引用此