Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition

Kaixun Huang; Ao Zhang; Binbin Zhang; Tianyi Xu; Xingchen Song; Lei Xie

doi:10.1109/ASRU57964.2023.10389631

Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition

Kaixun Huang, Ao Zhang, Binbin Zhang, Tianyi Xu, Xingchen Song, Lei Xie

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

3 Scopus citations

Abstract

The attention-based deep contextual biasing method has been demonstrated to effectively improve the recognition performance of end-to-end automatic speech recognition (ASR) systems on given contextual phrases. However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control the degree of bias. In this study, we introduce a spike-triggered deep biasing method that simultaneously supports both explicit and implicit bias. Moreover, both bias approaches exhibit significant improvements and can be cascaded with shallow fusion methods for better results. Furthermore, we propose a context sampling enhancement strategy and improve the contextual phrase filtering algorithm. Experiments on the public WenetSpeech Mandarin biased-word dataset show a 32.0% relative CER reduction compared to the baseline model, with an impressively 68.6% relative CER reduction on contextual phrases.

Original language	English
Title of host publication	2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9798350306897
DOIs	https://doi.org/10.1109/ASRU57964.2023.10389631
State	Published - 2023
Event	2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023 - Taipei, Taiwan, Province of China Duration: 16 Dec 2023 → 20 Dec 2023

Publication series

Name	2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

Conference

Conference	2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
Country/Territory	Taiwan, Province of China
City	Taipei
Period	16/12/23 → 20/12/23

Keywords

attention-based encoder-decoder
contextual biasing
end-to-end

Access to Document

10.1109/ASRU57964.2023.10389631

Cite this

Huang, K., Zhang, A., Zhang, B., Xu, T., Song, X., & Xie, L. (2023). Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023 (2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU57964.2023.10389631

@inproceedings{55d6508d2d574ce9a068d810650a0df6,

title = "Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition",

abstract = "The attention-based deep contextual biasing method has been demonstrated to effectively improve the recognition performance of end-to-end automatic speech recognition (ASR) systems on given contextual phrases. However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control the degree of bias. In this study, we introduce a spike-triggered deep biasing method that simultaneously supports both explicit and implicit bias. Moreover, both bias approaches exhibit significant improvements and can be cascaded with shallow fusion methods for better results. Furthermore, we propose a context sampling enhancement strategy and improve the contextual phrase filtering algorithm. Experiments on the public WenetSpeech Mandarin biased-word dataset show a 32.0% relative CER reduction compared to the baseline model, with an impressively 68.6% relative CER reduction on contextual phrases.",

keywords = "attention-based encoder-decoder, contextual biasing, end-to-end",

author = "Kaixun Huang and Ao Zhang and Binbin Zhang and Tianyi Xu and Xingchen Song and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023 ; Conference date: 16-12-2023 Through 20-12-2023",

year = "2023",

doi = "10.1109/ASRU57964.2023.10389631",

language = "英语",

series = "2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023",

}

Huang, K, Zhang, A, Zhang, B, Xu, T, Song, X & Xie, L 2023, Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition. in 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023. 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023, Institute of Electrical and Electronics Engineers Inc., 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023, Taipei, Taiwan, Province of China, 16/12/23. https://doi.org/10.1109/ASRU57964.2023.10389631

Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition. / Huang, Kaixun; Zhang, Ao; Zhang, Binbin et al.
2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023. Institute of Electrical and Electronics Engineers Inc., 2023. (2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition

AU - Huang, Kaixun

AU - Zhang, Ao

AU - Zhang, Binbin

AU - Xu, Tianyi

AU - Song, Xingchen

AU - Xie, Lei

PY - 2023

Y1 - 2023

N2 - The attention-based deep contextual biasing method has been demonstrated to effectively improve the recognition performance of end-to-end automatic speech recognition (ASR) systems on given contextual phrases. However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control the degree of bias. In this study, we introduce a spike-triggered deep biasing method that simultaneously supports both explicit and implicit bias. Moreover, both bias approaches exhibit significant improvements and can be cascaded with shallow fusion methods for better results. Furthermore, we propose a context sampling enhancement strategy and improve the contextual phrase filtering algorithm. Experiments on the public WenetSpeech Mandarin biased-word dataset show a 32.0% relative CER reduction compared to the baseline model, with an impressively 68.6% relative CER reduction on contextual phrases.

AB - The attention-based deep contextual biasing method has been demonstrated to effectively improve the recognition performance of end-to-end automatic speech recognition (ASR) systems on given contextual phrases. However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control the degree of bias. In this study, we introduce a spike-triggered deep biasing method that simultaneously supports both explicit and implicit bias. Moreover, both bias approaches exhibit significant improvements and can be cascaded with shallow fusion methods for better results. Furthermore, we propose a context sampling enhancement strategy and improve the contextual phrase filtering algorithm. Experiments on the public WenetSpeech Mandarin biased-word dataset show a 32.0% relative CER reduction compared to the baseline model, with an impressively 68.6% relative CER reduction on contextual phrases.

KW - attention-based encoder-decoder

KW - contextual biasing

KW - end-to-end

UR - http://www.scopus.com/inward/record.url?scp=85184658965&partnerID=8YFLogxK

U2 - 10.1109/ASRU57964.2023.10389631

DO - 10.1109/ASRU57964.2023.10389631

M3 - 会议稿件

AN - SCOPUS:85184658965

T3 - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

BT - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

Y2 - 16 December 2023 through 20 December 2023

ER -

Huang K, Zhang A, Zhang B, Xu T, Song X, Xie L. Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023. Institute of Electrical and Electronics Engineers Inc. 2023. (2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023). doi: 10.1109/ASRU57964.2023.10389631

Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this