Attention-based end-to-end speech recognition on voice search

Changhao Shan, Junbo Zhang, Yujun Wang, Lei Xie

科研成果: 书/报告/会议事项章节会议稿件同行评审

61 引用 (Scopus)

摘要

Recently, there has been a growing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments. In this paper, we explore the use of attention-based encoder-decoder model for Mandarin speech recognition on a voice search task. Previous attempts have shown that applying attention-based encoder-decoder to Mandarin speech recognition was quite difficult due to the logographic orthography of Mandarin, the large vocabulary and the conditional dependency of the attention model. In this paper, we use character embedding to deal with the large vocabulary. Several tricks are used for effective model training, including L2 regularization, Gaussian weight noise and frame skipping. We compare two attention mechanisms and use attention smoothing to cover long context in the attention model. Taken together, these tricks allow us to finally achieve a character error rate (CER) of 3.58% and a sentence error rate (SER) of 7.43% on the MiTV voice search dataset. While together with a trigram language model, CER and SER reach 2.81% and 5.77%, respectively.

源语言英语
主期刊名2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
4764-4768
页数5
ISBN(印刷版)9781538646588
DOI
出版状态已出版 - 10 9月 2018
活动2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, 加拿大
期限: 15 4月 201820 4月 2018

出版系列

姓名ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2018-April
ISSN(印刷版)1520-6149

会议

会议2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
国家/地区加拿大
Calgary
时期15/04/1820/04/18

指纹

探究 'Attention-based end-to-end speech recognition on voice search' 的科研主题。它们共同构成独一无二的指纹。

引用此