Investigating End-to-end Speech Recognition for Mandarin-english Code-switching

Changhao Shan; Chao Weng; Guangsen Wang; Dan Su; Min Luo; Dong Yu; Lei Xie

doi:10.1109/ICASSP.2019.8682850

Investigating End-to-end Speech Recognition for Mandarin-english Code-switching

Changhao Shan, Chao Weng, Guangsen Wang, Dan Su, Min Luo, Dong Yu, Lei Xie

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

63 引用（Scopus）

摘要

Code-switching is a common phenomenon in many multilingual communities and presents a challenge to automatic speech recognition (ASR). In this paper, three approaches are investigated to improve end-to-end speech recognition on Mandarin-English code-switching task. First, multi-task learning (MTL) is introduced which enables the language identity information to facilitate Mandarin-English code-switching ASR. Second, we explore wordpieces, as opposed to graphemes, as English modeling units to reduce the mod-eling unit gap between Mandarin and English. Third, we employ transfer learning to utilize larger amount of monolingual Mandarin and English data to compensate the data sparsity issue of a code-switching task. Significant improvements are observed from all three approaches. With all three approaches combined, the final system achieves a character error rate (CER) of 6.49% on a real Mandarin-English code-switching task.

源语言	英语
主期刊名	2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
出版商	Institute of Electrical and Electronics Engineers Inc.
页	6056-6060
页数	5
ISBN（电子版）	9781479981311
DOI	https://doi.org/10.1109/ICASSP.2019.8682850
出版状态	已出版 - 5月 2019
活动	44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Brighton, 英国期限: 12 5月 2019 → 17 5月 2019

出版系列

姓名	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
卷	2019-May
ISSN（印刷版）	1520-6149

会议

会议	44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
国家/地区	英国
市	Brighton
时期	12/05/19 → 17/05/19

访问文件

10.1109/ICASSP.2019.8682850

其它文件与链接

链接到 Scopus 的出版物

引用此

Shan, C., Weng, C., Wang, G., Su, D., Luo, M., Yu, D., & Xie, L. (2019). Investigating End-to-end Speech Recognition for Mandarin-english Code-switching. 在 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings (页码 6056-6060). 文章 8682850 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 卷 2019-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2019.8682850

Shan, Changhao ; Weng, Chao ; Wang, Guangsen 等. / Investigating End-to-end Speech Recognition for Mandarin-english Code-switching. 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. 页码 6056-6060 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{17f8fec6b713468380cf00732bee3620,

title = "Investigating End-to-end Speech Recognition for Mandarin-english Code-switching",

abstract = "Code-switching is a common phenomenon in many multilingual communities and presents a challenge to automatic speech recognition (ASR). In this paper, three approaches are investigated to improve end-to-end speech recognition on Mandarin-English code-switching task. First, multi-task learning (MTL) is introduced which enables the language identity information to facilitate Mandarin-English code-switching ASR. Second, we explore wordpieces, as opposed to graphemes, as English modeling units to reduce the mod-eling unit gap between Mandarin and English. Third, we employ transfer learning to utilize larger amount of monolingual Mandarin and English data to compensate the data sparsity issue of a code-switching task. Significant improvements are observed from all three approaches. With all three approaches combined, the final system achieves a character error rate (CER) of 6.49% on a real Mandarin-English code-switching task.",

keywords = "attention-based model, automatic speech recognition, code-switching, end-to-end speech recognition",

author = "Changhao Shan and Chao Weng and Guangsen Wang and Dan Su and Min Luo and Dong Yu and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 ; Conference date: 12-05-2019 Through 17-05-2019",

year = "2019",

month = may,

doi = "10.1109/ICASSP.2019.8682850",

language = "英语",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "6056--6060",

booktitle = "2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings",

}

Shan, C, Weng, C, Wang, G, Su, D, Luo, M, Yu, D & Xie, L 2019, Investigating End-to-end Speech Recognition for Mandarin-english Code-switching. 在 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings., 8682850, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 卷 2019-May, Institute of Electrical and Electronics Engineers Inc., 页码 6056-6060, 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019, Brighton, 英国, 12/05/19. https://doi.org/10.1109/ICASSP.2019.8682850

Investigating End-to-end Speech Recognition for Mandarin-english Code-switching. / Shan, Changhao; Weng, Chao; Wang, Guangsen 等.
2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. 页码 6056-6060 8682850 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 卷 2019-May).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Investigating End-to-end Speech Recognition for Mandarin-english Code-switching

AU - Shan, Changhao

AU - Weng, Chao

AU - Wang, Guangsen

AU - Su, Dan

AU - Luo, Min

AU - Yu, Dong

AU - Xie, Lei

PY - 2019/5

Y1 - 2019/5

N2 - Code-switching is a common phenomenon in many multilingual communities and presents a challenge to automatic speech recognition (ASR). In this paper, three approaches are investigated to improve end-to-end speech recognition on Mandarin-English code-switching task. First, multi-task learning (MTL) is introduced which enables the language identity information to facilitate Mandarin-English code-switching ASR. Second, we explore wordpieces, as opposed to graphemes, as English modeling units to reduce the mod-eling unit gap between Mandarin and English. Third, we employ transfer learning to utilize larger amount of monolingual Mandarin and English data to compensate the data sparsity issue of a code-switching task. Significant improvements are observed from all three approaches. With all three approaches combined, the final system achieves a character error rate (CER) of 6.49% on a real Mandarin-English code-switching task.

AB - Code-switching is a common phenomenon in many multilingual communities and presents a challenge to automatic speech recognition (ASR). In this paper, three approaches are investigated to improve end-to-end speech recognition on Mandarin-English code-switching task. First, multi-task learning (MTL) is introduced which enables the language identity information to facilitate Mandarin-English code-switching ASR. Second, we explore wordpieces, as opposed to graphemes, as English modeling units to reduce the mod-eling unit gap between Mandarin and English. Third, we employ transfer learning to utilize larger amount of monolingual Mandarin and English data to compensate the data sparsity issue of a code-switching task. Significant improvements are observed from all three approaches. With all three approaches combined, the final system achieves a character error rate (CER) of 6.49% on a real Mandarin-English code-switching task.

KW - attention-based model

KW - automatic speech recognition

KW - code-switching

KW - end-to-end speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85068986115&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2019.8682850

DO - 10.1109/ICASSP.2019.8682850

M3 - 会议稿件

AN - SCOPUS:85068986115

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 6056

EP - 6060

BT - 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019

Y2 - 12 May 2019 through 17 May 2019

ER -

Shan C, Weng C, Wang G, Su D, Luo M, Yu D 等. Investigating End-to-end Speech Recognition for Mandarin-english Code-switching. 在 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2019. 页码 6056-6060. 8682850. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2019.8682850

Investigating End-to-end Speech Recognition for Mandarin-english Code-switching

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此