Phoneme lattice based texttiling towards multilingual story segmentation

Xiaoxuan Wang, Lei Xie, Bin Ma, Eng Siong Chng, Haizhou Li

科研成果: 书/报告/会议事项章节会议稿件同行评审

6 引用 (Scopus)

摘要

This paper proposes a phoneme lattice based TextTiling approach towards multilingual story segmentation. The phoneme is the smallest segmental unit in a language and the number of phonemes in a language is usually far smaller than the number of words. Furthermore, many phonemes are shared by different languages. These properties make phonemes particularly appropriate for representing multilingual speech. As phoneme recognition is far from perfect, phoneme lattices, which carry much richer statistics than the 1-best hypotheses, are adopted in this paper as the input to the TextTiling approach. The term frequencies used in traditional TextTiling are replaced by the expected counts of phoneme n-gram units calculated from phoneme lattices. Experiments on TDT2 English and Mandarin corpora show that the phoneme lattice based TextTiling outperforms the phoneme 1-best based TextTiling and word based TextTiling in broadcast news story segmentation.

源语言英语
主期刊名Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
出版商International Speech Communication Association
1305-1308
页数4
出版状态已出版 - 2010

出版系列

姓名Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

指纹

探究 'Phoneme lattice based texttiling towards multilingual story segmentation' 的科研主题。它们共同构成独一无二的指纹。

引用此