Towards language-universal Mandarin-English speech recognition

Shiliang Zhang; Yuan Liu; Ming Lei; Bin Ma; Lei Xie

doi:10.21437/Interspeech.2019-1365

Towards language-universal Mandarin-English speech recognition

Shiliang Zhang, Yuan Liu, Ming Lei, Bin Ma, Lei Xie

School of Computer Science

Alibaba Group Holding Ltd.

Research output: Contribution to journal › Conference article › peer-review

17 Scopus citations

Abstract

Multilingual and code-switching speech recognition are two challenging tasks that are studied separately in many previous works. In this work, we jointly study multilingual and code-switching problems, and present a language-universal bilingual system for Mandarin-English speech recognition. Specifically, we propose a novel bilingual acoustic model, which consists of two monolingual system initialized subnets and a shared output layer corresponding to the Character-Subword acoustic modeling units. The bilingual acoustic model is trained using a large Mandarin-English corpus with CTC and sMBR criteria. We find that this model, which is not given any information about language identity, can achieve comparable performance in monolingual Mandarin and English test sets compared to the well-trained language-specific Mandarin and English ASR systems, respectively. More importantly, the proposed bilingual model can automatically learn the language switching. Experimental results on a Mandarin-English code-switching test set show that it can achieve 11.8% and 17.9% relative error reduction on Mandarin and English parts, respectively.

Original language	English
Pages (from-to)	2170-2174
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2019-September
DOIs	https://doi.org/10.21437/Interspeech.2019-1365
State	Published - 2019
Event	20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria Duration: 15 Sep 2019 → 19 Sep 2019

Keywords

Bilingual
Code-switching
DFSMN-CTC-sMBR
Mandarin-English
Speech recognition

Access to Document

10.21437/Interspeech.2019-1365

Cite this

@article{956229f674024efab8e9e94ecc658707,

title = "Towards language-universal Mandarin-English speech recognition",

abstract = "Multilingual and code-switching speech recognition are two challenging tasks that are studied separately in many previous works. In this work, we jointly study multilingual and code-switching problems, and present a language-universal bilingual system for Mandarin-English speech recognition. Specifically, we propose a novel bilingual acoustic model, which consists of two monolingual system initialized subnets and a shared output layer corresponding to the Character-Subword acoustic modeling units. The bilingual acoustic model is trained using a large Mandarin-English corpus with CTC and sMBR criteria. We find that this model, which is not given any information about language identity, can achieve comparable performance in monolingual Mandarin and English test sets compared to the well-trained language-specific Mandarin and English ASR systems, respectively. More importantly, the proposed bilingual model can automatically learn the language switching. Experimental results on a Mandarin-English code-switching test set show that it can achieve 11.8% and 17.9% relative error reduction on Mandarin and English parts, respectively.",

keywords = "Bilingual, Code-switching, DFSMN-CTC-sMBR, Mandarin-English, Speech recognition",

author = "Shiliang Zhang and Yuan Liu and Ming Lei and Bin Ma and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2019 ISCA; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 ; Conference date: 15-09-2019 Through 19-09-2019",

year = "2019",

doi = "10.21437/Interspeech.2019-1365",

language = "英语",

volume = "2019-September",

pages = "2170--2174",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Towards language-universal Mandarin-English speech recognition

AU - Zhang, Shiliang

AU - Liu, Yuan

AU - Lei, Ming

AU - Ma, Bin

AU - Xie, Lei

PY - 2019

Y1 - 2019

N2 - Multilingual and code-switching speech recognition are two challenging tasks that are studied separately in many previous works. In this work, we jointly study multilingual and code-switching problems, and present a language-universal bilingual system for Mandarin-English speech recognition. Specifically, we propose a novel bilingual acoustic model, which consists of two monolingual system initialized subnets and a shared output layer corresponding to the Character-Subword acoustic modeling units. The bilingual acoustic model is trained using a large Mandarin-English corpus with CTC and sMBR criteria. We find that this model, which is not given any information about language identity, can achieve comparable performance in monolingual Mandarin and English test sets compared to the well-trained language-specific Mandarin and English ASR systems, respectively. More importantly, the proposed bilingual model can automatically learn the language switching. Experimental results on a Mandarin-English code-switching test set show that it can achieve 11.8% and 17.9% relative error reduction on Mandarin and English parts, respectively.

AB - Multilingual and code-switching speech recognition are two challenging tasks that are studied separately in many previous works. In this work, we jointly study multilingual and code-switching problems, and present a language-universal bilingual system for Mandarin-English speech recognition. Specifically, we propose a novel bilingual acoustic model, which consists of two monolingual system initialized subnets and a shared output layer corresponding to the Character-Subword acoustic modeling units. The bilingual acoustic model is trained using a large Mandarin-English corpus with CTC and sMBR criteria. We find that this model, which is not given any information about language identity, can achieve comparable performance in monolingual Mandarin and English test sets compared to the well-trained language-specific Mandarin and English ASR systems, respectively. More importantly, the proposed bilingual model can automatically learn the language switching. Experimental results on a Mandarin-English code-switching test set show that it can achieve 11.8% and 17.9% relative error reduction on Mandarin and English parts, respectively.

KW - Bilingual

KW - Code-switching

KW - DFSMN-CTC-sMBR

KW - Mandarin-English

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85074696095&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2019-1365

DO - 10.21437/Interspeech.2019-1365

M3 - 会议文章

AN - SCOPUS:85074696095

SN - 2308-457X

VL - 2019-September

SP - 2170

EP - 2174

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019

Y2 - 15 September 2019 through 19 September 2019

ER -

Towards language-universal Mandarin-English speech recognition

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this