Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition

Kaixun Huang, Ao Zhang, Binbin Zhang, Tianyi Xu, Xingchen Song, Lei Xie

科研成果: 书/报告/会议事项章节会议稿件同行评审

3 引用 (Scopus)

摘要

The attention-based deep contextual biasing method has been demonstrated to effectively improve the recognition performance of end-to-end automatic speech recognition (ASR) systems on given contextual phrases. However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control the degree of bias. In this study, we introduce a spike-triggered deep biasing method that simultaneously supports both explicit and implicit bias. Moreover, both bias approaches exhibit significant improvements and can be cascaded with shallow fusion methods for better results. Furthermore, we propose a context sampling enhancement strategy and improve the contextual phrase filtering algorithm. Experiments on the public WenetSpeech Mandarin biased-word dataset show a 32.0% relative CER reduction compared to the baseline model, with an impressively 68.6% relative CER reduction on contextual phrases.

源语言英语
主期刊名2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
出版商Institute of Electrical and Electronics Engineers Inc.
ISBN(电子版)9798350306897
DOI
出版状态已出版 - 2023
活动2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023 - Taipei, 中国台湾
期限: 16 12月 202320 12月 2023

出版系列

姓名2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

会议

会议2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
国家/地区中国台湾
Taipei
时期16/12/2320/12/23

指纹

探究 'Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition' 的科研主题。它们共同构成独一无二的指纹。

引用此