Study of semi-supervised approaches to improving english-Mandarin code-switching speech recognition

Pengcheng Guo; Haihua Xu; Lei Xie; Eng Siong Chng

doi:10.21437/Interspeech.2018-1974

Study of semi-supervised approaches to improving english-Mandarin code-switching speech recognition

Pengcheng Guo, Haihua Xu, Lei Xie, Eng Siong Chng

School of Computer Science

Research output: Contribution to journal › Conference article › peer-review

30 Scopus citations

Abstract

In this paper, we present our efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian Mandarin-English (SEAME) data. We first investigate semi-supervised lexicon learning approach to adapt the canonical lexicon, which is meant to alleviate the heavily accented pronunciation issue within the code-switching conversation of the local area. As a result, the learned lexicon yields improved performance. Furthermore, we attempt to use semi-supervised training to deal with those transcriptions that are highly mismatched between the human transcribers and the ASR system. Specifically, we conduct semi-supervised training assuming those poorly transcribed data as unsupervised data. We found the semi-supervised acoustic modeling can lead to improved results. Finally, to make up for the limitation of the conventional n-gram language models due to the data sparsity issue, we perform lattice rescoring using neural network language models, and significant WER reduction is obtained.

Original language	English
Pages (from-to)	1928-1932
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2018-September
DOIs	https://doi.org/10.21437/Interspeech.2018-1974
State	Published - 2018
Event	19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India Duration: 2 Sep 2018 → 6 Sep 2018

Keywords

Code-switching
Lattice rescoring
Lexicon learning
Semi-supervised training
Speech recognition

Access to Document

10.21437/Interspeech.2018-1974

Cite this

@article{b455dca95812475c8567dbce1a1dbcfc,

title = "Study of semi-supervised approaches to improving english-Mandarin code-switching speech recognition",

abstract = "In this paper, we present our efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian Mandarin-English (SEAME) data. We first investigate semi-supervised lexicon learning approach to adapt the canonical lexicon, which is meant to alleviate the heavily accented pronunciation issue within the code-switching conversation of the local area. As a result, the learned lexicon yields improved performance. Furthermore, we attempt to use semi-supervised training to deal with those transcriptions that are highly mismatched between the human transcribers and the ASR system. Specifically, we conduct semi-supervised training assuming those poorly transcribed data as unsupervised data. We found the semi-supervised acoustic modeling can lead to improved results. Finally, to make up for the limitation of the conventional n-gram language models due to the data sparsity issue, we perform lattice rescoring using neural network language models, and significant WER reduction is obtained.",

keywords = "Code-switching, Lattice rescoring, Lexicon learning, Semi-supervised training, Speech recognition",

author = "Pengcheng Guo and Haihua Xu and Lei Xie and Chng, {Eng Siong}",

note = "Publisher Copyright: {\textcopyright} 2018 International Speech Communication Association. All rights reserved.; 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 ; Conference date: 02-09-2018 Through 06-09-2018",

year = "2018",

doi = "10.21437/Interspeech.2018-1974",

language = "英语",

volume = "2018-September",

pages = "1928--1932",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

Study of semi-supervised approaches to improving english-Mandarin code-switching speech recognition. / Guo, Pengcheng; Xu, Haihua; Xie, Lei et al.
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2018-September, 2018, p. 1928-1932.

Research output: Contribution to journal › Conference article › peer-review

TY - JOUR

T1 - Study of semi-supervised approaches to improving english-Mandarin code-switching speech recognition

AU - Guo, Pengcheng

AU - Xu, Haihua

AU - Xie, Lei

AU - Chng, Eng Siong

PY - 2018

Y1 - 2018

N2 - In this paper, we present our efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian Mandarin-English (SEAME) data. We first investigate semi-supervised lexicon learning approach to adapt the canonical lexicon, which is meant to alleviate the heavily accented pronunciation issue within the code-switching conversation of the local area. As a result, the learned lexicon yields improved performance. Furthermore, we attempt to use semi-supervised training to deal with those transcriptions that are highly mismatched between the human transcribers and the ASR system. Specifically, we conduct semi-supervised training assuming those poorly transcribed data as unsupervised data. We found the semi-supervised acoustic modeling can lead to improved results. Finally, to make up for the limitation of the conventional n-gram language models due to the data sparsity issue, we perform lattice rescoring using neural network language models, and significant WER reduction is obtained.

AB - In this paper, we present our efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian Mandarin-English (SEAME) data. We first investigate semi-supervised lexicon learning approach to adapt the canonical lexicon, which is meant to alleviate the heavily accented pronunciation issue within the code-switching conversation of the local area. As a result, the learned lexicon yields improved performance. Furthermore, we attempt to use semi-supervised training to deal with those transcriptions that are highly mismatched between the human transcribers and the ASR system. Specifically, we conduct semi-supervised training assuming those poorly transcribed data as unsupervised data. We found the semi-supervised acoustic modeling can lead to improved results. Finally, to make up for the limitation of the conventional n-gram language models due to the data sparsity issue, we perform lattice rescoring using neural network language models, and significant WER reduction is obtained.

KW - Code-switching

KW - Lattice rescoring

KW - Lexicon learning

KW - Semi-supervised training

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85054957048&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2018-1974

DO - 10.21437/Interspeech.2018-1974

M3 - 会议文章

AN - SCOPUS:85054957048

SN - 2308-457X

VL - 2018-September

SP - 1928

EP - 1932

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018

Y2 - 2 September 2018 through 6 September 2018

ER -

Study of semi-supervised approaches to improving english-Mandarin code-switching speech recognition

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this