Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System

Changhao Shan; Chao Weng; Guangsen Wang; Dan Su; Min Luo; Dong Yu; Lei Xie

doi:10.1109/ICASSP.2019.8682490

Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System

Changhao Shan, Chao Weng, Guangsen Wang, Dan Su, Min Luo, Dong Yu, Lei Xie

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

78 引用（Scopus）

摘要

Recently, attention-based end-to-end automatic speech recognition system (ASR) has shown promising results. One of the limitations of an attention-based ASR system is that its language model (LM) component has to be implicitly learned from transcribed speech data which prevents one from uti-lizing plenty of text corpora to improve language modeling. In this work, the Component Fusion method is proposed to incorporate externally trained neural network (NN) LM into an attention-based ASR system. During training stage we equip the attention-based system with an additional LM component which is replaced by an externally trained NN LM at decoding stage. Experimental results show that the proposed Component Fusion outperforms two prior LM fusion approaches, i.e., Shallow Fusion and Cold Fusion, in both out-of-domain and in-domain scenarios. Further improvements can be achieved when combining Component and Shallow Fusion.

源语言	英语
主期刊名	2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
出版商	Institute of Electrical and Electronics Engineers Inc.
页	5631-5635
页数	5
ISBN（电子版）	9781479981311
DOI	https://doi.org/10.1109/ICASSP.2019.8682490
出版状态	已出版 - 5月 2019
活动	44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Brighton, 英国期限: 12 5月 2019 → 17 5月 2019

出版系列

姓名	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
卷	2019-May
ISSN（印刷版）	1520-6149

会议

会议	44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
国家/地区	英国
市	Brighton
时期	12/05/19 → 17/05/19

访问文件

10.1109/ICASSP.2019.8682490

其它文件与链接

链接到 Scopus 的出版物

引用此

Shan, C., Weng, C., Wang, G., Su, D., Luo, M., Yu, D., & Xie, L. (2019). Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System. 在 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings (页码 5631-5635). 文章 8682490 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 卷 2019-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2019.8682490

Shan, Changhao ; Weng, Chao ; Wang, Guangsen 等. / Component Fusion : Learning Replaceable Language Model Component for End-to-end Speech Recognition System. 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. 页码 5631-5635 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{d706eabbf39a41aa90014b1efbd8a7a3,

title = "Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System",

abstract = "Recently, attention-based end-to-end automatic speech recognition system (ASR) has shown promising results. One of the limitations of an attention-based ASR system is that its language model (LM) component has to be implicitly learned from transcribed speech data which prevents one from uti-lizing plenty of text corpora to improve language modeling. In this work, the Component Fusion method is proposed to incorporate externally trained neural network (NN) LM into an attention-based ASR system. During training stage we equip the attention-based system with an additional LM component which is replaced by an externally trained NN LM at decoding stage. Experimental results show that the proposed Component Fusion outperforms two prior LM fusion approaches, i.e., Shallow Fusion and Cold Fusion, in both out-of-domain and in-domain scenarios. Further improvements can be achieved when combining Component and Shallow Fusion.",

keywords = "attention-based model, automatic speech recognition, end-to-end speech recognition, language model",

author = "Changhao Shan and Chao Weng and Guangsen Wang and Dan Su and Min Luo and Dong Yu and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 ; Conference date: 12-05-2019 Through 17-05-2019",

year = "2019",

month = may,

doi = "10.1109/ICASSP.2019.8682490",

language = "英语",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "5631--5635",

booktitle = "2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings",

}

Shan, C, Weng, C, Wang, G, Su, D, Luo, M, Yu, D & Xie, L 2019, Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System. 在 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings., 8682490, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 卷 2019-May, Institute of Electrical and Electronics Engineers Inc., 页码 5631-5635, 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019, Brighton, 英国, 12/05/19. https://doi.org/10.1109/ICASSP.2019.8682490

Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System. / Shan, Changhao; Weng, Chao; Wang, Guangsen 等.
2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. 页码 5631-5635 8682490 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 卷 2019-May).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Component Fusion

T2 - 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019

AU - Shan, Changhao

AU - Weng, Chao

AU - Wang, Guangsen

AU - Su, Dan

AU - Luo, Min

AU - Yu, Dong

AU - Xie, Lei

PY - 2019/5

Y1 - 2019/5

N2 - Recently, attention-based end-to-end automatic speech recognition system (ASR) has shown promising results. One of the limitations of an attention-based ASR system is that its language model (LM) component has to be implicitly learned from transcribed speech data which prevents one from uti-lizing plenty of text corpora to improve language modeling. In this work, the Component Fusion method is proposed to incorporate externally trained neural network (NN) LM into an attention-based ASR system. During training stage we equip the attention-based system with an additional LM component which is replaced by an externally trained NN LM at decoding stage. Experimental results show that the proposed Component Fusion outperforms two prior LM fusion approaches, i.e., Shallow Fusion and Cold Fusion, in both out-of-domain and in-domain scenarios. Further improvements can be achieved when combining Component and Shallow Fusion.

AB - Recently, attention-based end-to-end automatic speech recognition system (ASR) has shown promising results. One of the limitations of an attention-based ASR system is that its language model (LM) component has to be implicitly learned from transcribed speech data which prevents one from uti-lizing plenty of text corpora to improve language modeling. In this work, the Component Fusion method is proposed to incorporate externally trained neural network (NN) LM into an attention-based ASR system. During training stage we equip the attention-based system with an additional LM component which is replaced by an externally trained NN LM at decoding stage. Experimental results show that the proposed Component Fusion outperforms two prior LM fusion approaches, i.e., Shallow Fusion and Cold Fusion, in both out-of-domain and in-domain scenarios. Further improvements can be achieved when combining Component and Shallow Fusion.

KW - attention-based model

KW - automatic speech recognition

KW - end-to-end speech recognition

KW - language model

UR - http://www.scopus.com/inward/record.url?scp=85068975791&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2019.8682490

DO - 10.1109/ICASSP.2019.8682490

M3 - 会议稿件

AN - SCOPUS:85068975791

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 5631

EP - 5635

BT - 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 12 May 2019 through 17 May 2019

ER -

Shan C, Weng C, Wang G, Su D, Luo M, Yu D 等. Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System. 在 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2019. 页码 5631-5635. 8682490. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2019.8682490

Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此