Towards language-universal Mandarin-English speech recognition

Shiliang Zhang, Yuan Liu, Ming Lei, Bin Ma, Lei Xie

Research output: Contribution to journalConference articlepeer-review

17 Scopus citations

Abstract

Multilingual and code-switching speech recognition are two challenging tasks that are studied separately in many previous works. In this work, we jointly study multilingual and code-switching problems, and present a language-universal bilingual system for Mandarin-English speech recognition. Specifically, we propose a novel bilingual acoustic model, which consists of two monolingual system initialized subnets and a shared output layer corresponding to the Character-Subword acoustic modeling units. The bilingual acoustic model is trained using a large Mandarin-English corpus with CTC and sMBR criteria. We find that this model, which is not given any information about language identity, can achieve comparable performance in monolingual Mandarin and English test sets compared to the well-trained language-specific Mandarin and English ASR systems, respectively. More importantly, the proposed bilingual model can automatically learn the language switching. Experimental results on a Mandarin-English code-switching test set show that it can achieve 11.8% and 17.9% relative error reduction on Mandarin and English parts, respectively.

Keywords

  • Bilingual
  • Code-switching
  • DFSMN-CTC-sMBR
  • Mandarin-English
  • Speech recognition

Fingerprint

Dive into the research topics of 'Towards language-universal Mandarin-English speech recognition'. Together they form a unique fingerprint.

Cite this