Towards language-universal Mandarin-English speech recognition

Shiliang Zhang; Yuan Liu; Ming Lei; Bin Ma; Lei Xie

doi:10.21437/Interspeech.2019-1365

Towards language-universal Mandarin-English speech recognition

Shiliang Zhang, Yuan Liu, Ming Lei, Bin Ma, Lei Xie

计算机学院

Alibaba Group Holding Ltd.

科研成果: 期刊稿件 › 会议文章 › 同行评审

17 引用（Scopus）

摘要

Multilingual and code-switching speech recognition are two challenging tasks that are studied separately in many previous works. In this work, we jointly study multilingual and code-switching problems, and present a language-universal bilingual system for Mandarin-English speech recognition. Specifically, we propose a novel bilingual acoustic model, which consists of two monolingual system initialized subnets and a shared output layer corresponding to the Character-Subword acoustic modeling units. The bilingual acoustic model is trained using a large Mandarin-English corpus with CTC and sMBR criteria. We find that this model, which is not given any information about language identity, can achieve comparable performance in monolingual Mandarin and English test sets compared to the well-trained language-specific Mandarin and English ASR systems, respectively. More importantly, the proposed bilingual model can automatically learn the language switching. Experimental results on a Mandarin-English code-switching test set show that it can achieve 11.8% and 17.9% relative error reduction on Mandarin and English parts, respectively.

源语言	英语
页（从-至）	2170-2174
页数	5
期刊	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
卷	2019-September
DOI	https://doi.org/10.21437/Interspeech.2019-1365
出版状态	已出版 - 2019
活动	20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, 奥地利期限: 15 9月 2019 → 19 9月 2019

访问文件

10.21437/Interspeech.2019-1365

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{956229f674024efab8e9e94ecc658707,

title = "Towards language-universal Mandarin-English speech recognition",

abstract = "Multilingual and code-switching speech recognition are two challenging tasks that are studied separately in many previous works. In this work, we jointly study multilingual and code-switching problems, and present a language-universal bilingual system for Mandarin-English speech recognition. Specifically, we propose a novel bilingual acoustic model, which consists of two monolingual system initialized subnets and a shared output layer corresponding to the Character-Subword acoustic modeling units. The bilingual acoustic model is trained using a large Mandarin-English corpus with CTC and sMBR criteria. We find that this model, which is not given any information about language identity, can achieve comparable performance in monolingual Mandarin and English test sets compared to the well-trained language-specific Mandarin and English ASR systems, respectively. More importantly, the proposed bilingual model can automatically learn the language switching. Experimental results on a Mandarin-English code-switching test set show that it can achieve 11.8% and 17.9% relative error reduction on Mandarin and English parts, respectively.",

keywords = "Bilingual, Code-switching, DFSMN-CTC-sMBR, Mandarin-English, Speech recognition",

author = "Shiliang Zhang and Yuan Liu and Ming Lei and Bin Ma and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2019 ISCA; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 ; Conference date: 15-09-2019 Through 19-09-2019",

year = "2019",

doi = "10.21437/Interspeech.2019-1365",

language = "英语",

volume = "2019-September",

pages = "2170--2174",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Towards language-universal Mandarin-English speech recognition

AU - Zhang, Shiliang

AU - Liu, Yuan

AU - Lei, Ming

AU - Ma, Bin

AU - Xie, Lei

PY - 2019

Y1 - 2019

N2 - Multilingual and code-switching speech recognition are two challenging tasks that are studied separately in many previous works. In this work, we jointly study multilingual and code-switching problems, and present a language-universal bilingual system for Mandarin-English speech recognition. Specifically, we propose a novel bilingual acoustic model, which consists of two monolingual system initialized subnets and a shared output layer corresponding to the Character-Subword acoustic modeling units. The bilingual acoustic model is trained using a large Mandarin-English corpus with CTC and sMBR criteria. We find that this model, which is not given any information about language identity, can achieve comparable performance in monolingual Mandarin and English test sets compared to the well-trained language-specific Mandarin and English ASR systems, respectively. More importantly, the proposed bilingual model can automatically learn the language switching. Experimental results on a Mandarin-English code-switching test set show that it can achieve 11.8% and 17.9% relative error reduction on Mandarin and English parts, respectively.

AB - Multilingual and code-switching speech recognition are two challenging tasks that are studied separately in many previous works. In this work, we jointly study multilingual and code-switching problems, and present a language-universal bilingual system for Mandarin-English speech recognition. Specifically, we propose a novel bilingual acoustic model, which consists of two monolingual system initialized subnets and a shared output layer corresponding to the Character-Subword acoustic modeling units. The bilingual acoustic model is trained using a large Mandarin-English corpus with CTC and sMBR criteria. We find that this model, which is not given any information about language identity, can achieve comparable performance in monolingual Mandarin and English test sets compared to the well-trained language-specific Mandarin and English ASR systems, respectively. More importantly, the proposed bilingual model can automatically learn the language switching. Experimental results on a Mandarin-English code-switching test set show that it can achieve 11.8% and 17.9% relative error reduction on Mandarin and English parts, respectively.

KW - Bilingual

KW - Code-switching

KW - DFSMN-CTC-sMBR

KW - Mandarin-English

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85074696095&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2019-1365

DO - 10.21437/Interspeech.2019-1365

M3 - 会议文章

AN - SCOPUS:85074696095

SN - 2308-457X

VL - 2019-September

SP - 2170

EP - 2174

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019

Y2 - 15 September 2019 through 19 September 2019

ER -

Towards language-universal Mandarin-English speech recognition

摘要

访问文件

其它文件与链接

指纹

引用此