Subword lexical chaining for automatic story segmentation in Chinese broadcast news

Lei Xie, Yulian Yang, Jia Zeng

科研成果: 书/报告/会议事项章节会议稿件同行评审

8 引用 (Scopus)

摘要

We present a subword lexical chaining approach to automatic story segmentation of Chinese broadcast news (BN). Conventional lexical chains link related words with cohesion (e.g. repetition of words) and high concentration points of starting and ending chains are indicative of story boundaries. However, inevitable speech recognition errors in BN transcripts may destroy the cohesiveness of words, resulting in word match failures. We show the robustness of Chinese subwords (characters and syllables) in lexical matching in errorful ASR transcripts. This motivates us to discover story boundaries on chains formed by character and syllable n-gram units. Experimental results on the TDT2 Mandarin corpus show that chaining by character unigram exhibits the best story segmentation performance with relative F-measure improvement of 6.06% over conventional word chaining. Integrations of multi-scales (words and subwords) exhibit further improvement. For example, fusion by voting from different scales achieves an F-measure gain of 9.04% over words.

源语言英语
主期刊名Advances in Multimedia Information Processing - PCM 2008 - 9th Pacific Rim Conference on Multimedia, Proceedings
248-258
页数11
DOI
出版状态已出版 - 2008
活动9th Pacific Rim Conference on Multimedia, PCM 2008 - Tainan, 中国台湾
期限: 9 12月 200813 12月 2008

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
5353 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议9th Pacific Rim Conference on Multimedia, PCM 2008
国家/地区中国台湾
Tainan
时期9/12/0813/12/08

指纹

探究 'Subword lexical chaining for automatic story segmentation in Chinese broadcast news' 的科研主题。它们共同构成独一无二的指纹。

引用此