TY - GEN
T1 - Component Fusion
T2 - 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
AU - Shan, Changhao
AU - Weng, Chao
AU - Wang, Guangsen
AU - Su, Dan
AU - Luo, Min
AU - Yu, Dong
AU - Xie, Lei
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - Recently, attention-based end-to-end automatic speech recognition system (ASR) has shown promising results. One of the limitations of an attention-based ASR system is that its language model (LM) component has to be implicitly learned from transcribed speech data which prevents one from uti-lizing plenty of text corpora to improve language modeling. In this work, the Component Fusion method is proposed to incorporate externally trained neural network (NN) LM into an attention-based ASR system. During training stage we equip the attention-based system with an additional LM component which is replaced by an externally trained NN LM at decoding stage. Experimental results show that the proposed Component Fusion outperforms two prior LM fusion approaches, i.e., Shallow Fusion and Cold Fusion, in both out-of-domain and in-domain scenarios. Further improvements can be achieved when combining Component and Shallow Fusion.
AB - Recently, attention-based end-to-end automatic speech recognition system (ASR) has shown promising results. One of the limitations of an attention-based ASR system is that its language model (LM) component has to be implicitly learned from transcribed speech data which prevents one from uti-lizing plenty of text corpora to improve language modeling. In this work, the Component Fusion method is proposed to incorporate externally trained neural network (NN) LM into an attention-based ASR system. During training stage we equip the attention-based system with an additional LM component which is replaced by an externally trained NN LM at decoding stage. Experimental results show that the proposed Component Fusion outperforms two prior LM fusion approaches, i.e., Shallow Fusion and Cold Fusion, in both out-of-domain and in-domain scenarios. Further improvements can be achieved when combining Component and Shallow Fusion.
KW - attention-based model
KW - automatic speech recognition
KW - end-to-end speech recognition
KW - language model
UR - http://www.scopus.com/inward/record.url?scp=85068975791&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2019.8682490
DO - 10.1109/ICASSP.2019.8682490
M3 - 会议稿件
AN - SCOPUS:85068975791
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5631
EP - 5635
BT - 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 12 May 2019 through 17 May 2019
ER -