Study of semi-supervised approaches to improving english-Mandarin code-switching speech recognition

Pengcheng Guo; Haihua Xu; Lei Xie; Eng Siong Chng

doi:10.21437/Interspeech.2018-1974

Study of semi-supervised approaches to improving english-Mandarin code-switching speech recognition

Pengcheng Guo, Haihua Xu, Lei Xie, Eng Siong Chng

计算机学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

30 引用（Scopus）

摘要

In this paper, we present our efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian Mandarin-English (SEAME) data. We first investigate semi-supervised lexicon learning approach to adapt the canonical lexicon, which is meant to alleviate the heavily accented pronunciation issue within the code-switching conversation of the local area. As a result, the learned lexicon yields improved performance. Furthermore, we attempt to use semi-supervised training to deal with those transcriptions that are highly mismatched between the human transcribers and the ASR system. Specifically, we conduct semi-supervised training assuming those poorly transcribed data as unsupervised data. We found the semi-supervised acoustic modeling can lead to improved results. Finally, to make up for the limitation of the conventional n-gram language models due to the data sparsity issue, we perform lattice rescoring using neural network language models, and significant WER reduction is obtained.

源语言	英语
页（从-至）	1928-1932
页数	5
期刊	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
卷	2018-September
DOI	https://doi.org/10.21437/Interspeech.2018-1974
出版状态	已出版 - 2018
活动	19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, 印度期限: 2 9月 2018 → 6 9月 2018

访问文件

10.21437/Interspeech.2018-1974

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{b455dca95812475c8567dbce1a1dbcfc,

title = "Study of semi-supervised approaches to improving english-Mandarin code-switching speech recognition",

abstract = "In this paper, we present our efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian Mandarin-English (SEAME) data. We first investigate semi-supervised lexicon learning approach to adapt the canonical lexicon, which is meant to alleviate the heavily accented pronunciation issue within the code-switching conversation of the local area. As a result, the learned lexicon yields improved performance. Furthermore, we attempt to use semi-supervised training to deal with those transcriptions that are highly mismatched between the human transcribers and the ASR system. Specifically, we conduct semi-supervised training assuming those poorly transcribed data as unsupervised data. We found the semi-supervised acoustic modeling can lead to improved results. Finally, to make up for the limitation of the conventional n-gram language models due to the data sparsity issue, we perform lattice rescoring using neural network language models, and significant WER reduction is obtained.",

keywords = "Code-switching, Lattice rescoring, Lexicon learning, Semi-supervised training, Speech recognition",

author = "Pengcheng Guo and Haihua Xu and Lei Xie and Chng, {Eng Siong}",

note = "Publisher Copyright: {\textcopyright} 2018 International Speech Communication Association. All rights reserved.; 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 ; Conference date: 02-09-2018 Through 06-09-2018",

year = "2018",

doi = "10.21437/Interspeech.2018-1974",

language = "英语",

volume = "2018-September",

pages = "1928--1932",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Study of semi-supervised approaches to improving english-Mandarin code-switching speech recognition

AU - Guo, Pengcheng

AU - Xu, Haihua

AU - Xie, Lei

AU - Chng, Eng Siong

PY - 2018

Y1 - 2018

N2 - In this paper, we present our efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian Mandarin-English (SEAME) data. We first investigate semi-supervised lexicon learning approach to adapt the canonical lexicon, which is meant to alleviate the heavily accented pronunciation issue within the code-switching conversation of the local area. As a result, the learned lexicon yields improved performance. Furthermore, we attempt to use semi-supervised training to deal with those transcriptions that are highly mismatched between the human transcribers and the ASR system. Specifically, we conduct semi-supervised training assuming those poorly transcribed data as unsupervised data. We found the semi-supervised acoustic modeling can lead to improved results. Finally, to make up for the limitation of the conventional n-gram language models due to the data sparsity issue, we perform lattice rescoring using neural network language models, and significant WER reduction is obtained.

AB - In this paper, we present our efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian Mandarin-English (SEAME) data. We first investigate semi-supervised lexicon learning approach to adapt the canonical lexicon, which is meant to alleviate the heavily accented pronunciation issue within the code-switching conversation of the local area. As a result, the learned lexicon yields improved performance. Furthermore, we attempt to use semi-supervised training to deal with those transcriptions that are highly mismatched between the human transcribers and the ASR system. Specifically, we conduct semi-supervised training assuming those poorly transcribed data as unsupervised data. We found the semi-supervised acoustic modeling can lead to improved results. Finally, to make up for the limitation of the conventional n-gram language models due to the data sparsity issue, we perform lattice rescoring using neural network language models, and significant WER reduction is obtained.

KW - Code-switching

KW - Lattice rescoring

KW - Lexicon learning

KW - Semi-supervised training

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85054957048&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2018-1974

DO - 10.21437/Interspeech.2018-1974

M3 - 会议文章

AN - SCOPUS:85054957048

SN - 2308-457X

VL - 2018-September

SP - 1928

EP - 1932

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018

Y2 - 2 September 2018 through 6 September 2018

ER -

Study of semi-supervised approaches to improving english-Mandarin code-switching speech recognition

摘要

访问文件

其它文件与链接

指纹

引用此