A bi-directional LSTM approach for polyphone disambiguation in Mandarin Chinese

Changhao Shan; Lei Xie; Kaisheng Yao

doi:10.1109/ISCSLP.2016.7918392

A bi-directional LSTM approach for polyphone disambiguation in Mandarin Chinese

Changhao Shan, Lei Xie, Kaisheng Yao

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

23 Scopus citations

Abstract

Polyphone disambiguation in Mandarin Chinese aims to pick up the correct pronunciation from several candidates for a polyphonic character. It serves as an essential component in human language technologies such as text-to-speech synthesis. Since the pronunciation for most polyphonic characters can be easily decided from their contexts in the text, in this paper, we address the polyphone disambiguation problem as a sequential labeling task. Specifically, we propose to use bidirectional long short-term memory (BLSTM) neural network to encode both the past and future observations on the character sequence as its inputs and predict the pronunciations. We also empirically study the impacts of (1) modeling different length of contexts, (2) the number of BLSTM layers and (3) the granularity of part-o-speech (POS) tags as features. Our results show that using a deep BLSTM is able to achieve state-of-the-art performance in polyphone disambiguation.

Original language	English
Title of host publication	Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016
Editors	Hsin-Min Wang, Qingzhi Hou, Yuan Wei, Tan Lee, Jianguo Wei, Lei Xie, Hui Feng, Jianwu Dang, Jianwu Dang
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781509042937
DOIs	https://doi.org/10.1109/ISCSLP.2016.7918392
State	Published - 2 May 2017
Event	10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016 - Tianjin, China Duration: 17 Oct 2016 → 20 Oct 2016

Publication series

Name	Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016

Conference

Conference	10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016
Country/Territory	China
City	Tianjin
Period	17/10/16 → 20/10/16

Keywords

Bi-directional LSTM
Grapheme-to-phoneme conversion
Polyphone disambiguation
Sequence tagging
Text-to-Speech

Access to Document

10.1109/ISCSLP.2016.7918392

Cite this

Shan, C., Xie, L., & Yao, K. (2017). A bi-directional LSTM approach for polyphone disambiguation in Mandarin Chinese. In H.-M. Wang, Q. Hou, Y. Wei, T. Lee, J. Wei, L. Xie, H. Feng, J. Dang, & J. Dang (Eds.), Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016 Article 7918392 (Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISCSLP.2016.7918392

Shan, Changhao ; Xie, Lei ; Yao, Kaisheng. / A bi-directional LSTM approach for polyphone disambiguation in Mandarin Chinese. Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016. editor / Hsin-Min Wang ; Qingzhi Hou ; Yuan Wei ; Tan Lee ; Jianguo Wei ; Lei Xie ; Hui Feng ; Jianwu Dang ; Jianwu Dang. Institute of Electrical and Electronics Engineers Inc., 2017. (Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016).

@inproceedings{e6b23c7b88d44eeea91821de54f94ce3,

title = "A bi-directional LSTM approach for polyphone disambiguation in Mandarin Chinese",

abstract = "Polyphone disambiguation in Mandarin Chinese aims to pick up the correct pronunciation from several candidates for a polyphonic character. It serves as an essential component in human language technologies such as text-to-speech synthesis. Since the pronunciation for most polyphonic characters can be easily decided from their contexts in the text, in this paper, we address the polyphone disambiguation problem as a sequential labeling task. Specifically, we propose to use bidirectional long short-term memory (BLSTM) neural network to encode both the past and future observations on the character sequence as its inputs and predict the pronunciations. We also empirically study the impacts of (1) modeling different length of contexts, (2) the number of BLSTM layers and (3) the granularity of part-o-speech (POS) tags as features. Our results show that using a deep BLSTM is able to achieve state-of-the-art performance in polyphone disambiguation.",

keywords = "Bi-directional LSTM, Grapheme-to-phoneme conversion, Polyphone disambiguation, Sequence tagging, Text-to-Speech",

author = "Changhao Shan and Lei Xie and Kaisheng Yao",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.; 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016 ; Conference date: 17-10-2016 Through 20-10-2016",

year = "2017",

month = may,

day = "2",

doi = "10.1109/ISCSLP.2016.7918392",

language = "英语",

series = "Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

editor = "Hsin-Min Wang and Qingzhi Hou and Yuan Wei and Tan Lee and Jianguo Wei and Lei Xie and Hui Feng and Jianwu Dang and Jianwu Dang",

booktitle = "Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016",

}

Shan, C, Xie, L & Yao, K 2017, A bi-directional LSTM approach for polyphone disambiguation in Mandarin Chinese. in H-M Wang, Q Hou, Y Wei, T Lee, J Wei, L Xie, H Feng, J Dang & J Dang (eds), Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016., 7918392, Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016, Institute of Electrical and Electronics Engineers Inc., 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016, Tianjin, China, 17/10/16. https://doi.org/10.1109/ISCSLP.2016.7918392

A bi-directional LSTM approach for polyphone disambiguation in Mandarin Chinese. / Shan, Changhao; Xie, Lei; Yao, Kaisheng.
Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016. ed. / Hsin-Min Wang; Qingzhi Hou; Yuan Wei; Tan Lee; Jianguo Wei; Lei Xie; Hui Feng; Jianwu Dang; Jianwu Dang. Institute of Electrical and Electronics Engineers Inc., 2017. 7918392 (Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - A bi-directional LSTM approach for polyphone disambiguation in Mandarin Chinese

AU - Shan, Changhao

AU - Xie, Lei

AU - Yao, Kaisheng

PY - 2017/5/2

Y1 - 2017/5/2

N2 - Polyphone disambiguation in Mandarin Chinese aims to pick up the correct pronunciation from several candidates for a polyphonic character. It serves as an essential component in human language technologies such as text-to-speech synthesis. Since the pronunciation for most polyphonic characters can be easily decided from their contexts in the text, in this paper, we address the polyphone disambiguation problem as a sequential labeling task. Specifically, we propose to use bidirectional long short-term memory (BLSTM) neural network to encode both the past and future observations on the character sequence as its inputs and predict the pronunciations. We also empirically study the impacts of (1) modeling different length of contexts, (2) the number of BLSTM layers and (3) the granularity of part-o-speech (POS) tags as features. Our results show that using a deep BLSTM is able to achieve state-of-the-art performance in polyphone disambiguation.

AB - Polyphone disambiguation in Mandarin Chinese aims to pick up the correct pronunciation from several candidates for a polyphonic character. It serves as an essential component in human language technologies such as text-to-speech synthesis. Since the pronunciation for most polyphonic characters can be easily decided from their contexts in the text, in this paper, we address the polyphone disambiguation problem as a sequential labeling task. Specifically, we propose to use bidirectional long short-term memory (BLSTM) neural network to encode both the past and future observations on the character sequence as its inputs and predict the pronunciations. We also empirically study the impacts of (1) modeling different length of contexts, (2) the number of BLSTM layers and (3) the granularity of part-o-speech (POS) tags as features. Our results show that using a deep BLSTM is able to achieve state-of-the-art performance in polyphone disambiguation.

KW - Bi-directional LSTM

KW - Grapheme-to-phoneme conversion

KW - Polyphone disambiguation

KW - Sequence tagging

KW - Text-to-Speech

UR - http://www.scopus.com/inward/record.url?scp=85020231585&partnerID=8YFLogxK

U2 - 10.1109/ISCSLP.2016.7918392

DO - 10.1109/ISCSLP.2016.7918392

M3 - 会议稿件

AN - SCOPUS:85020231585

T3 - Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016

BT - Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016

A2 - Wang, Hsin-Min

A2 - Hou, Qingzhi

A2 - Wei, Yuan

A2 - Lee, Tan

A2 - Wei, Jianguo

A2 - Xie, Lei

A2 - Feng, Hui

A2 - Dang, Jianwu

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016

Y2 - 17 October 2016 through 20 October 2016

ER -

Shan C, Xie L, Yao K. A bi-directional LSTM approach for polyphone disambiguation in Mandarin Chinese. In Wang HM, Hou Q, Wei Y, Lee T, Wei J, Xie L, Feng H, Dang J, Dang J, editors, Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016. Institute of Electrical and Electronics Engineers Inc. 2017. 7918392. (Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016). doi: 10.1109/ISCSLP.2016.7918392

A bi-directional LSTM approach for polyphone disambiguation in Mandarin Chinese

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this