Maximum lexical cohesion for fine-grained news story segmentation

Zihan Liu, Lei Xie, Wei Feng

科研成果: 书/报告/会议事项章节会议稿件同行评审

5 引用 (Scopus)

摘要

We propose a maximum lexical cohesion (MLC) approach to news story segmentation. Unlike sentence-dependent lexical methods, our approach is able to detect story boundaries at finer word/subword granularity, and thus is more suitable for speech recognition transcripts which have no sentence delimiters. The proposed segmentation goodness measure takes account of both lexical cohesion and a prior preference of story length. We measure the lexical cohesion of a segment by the KL-divergence from its word distribution to an associated piecewise uniform distribution. Taking account of the uneven contributions of different words to a story, the cohesion measure is further refined by two word weighting schemes, i.e. the inverse document frequency (IDF) and a new weighting method called difference from expectation (DFE). We then propose a dynamic programming solution to exactly maximize the segmentation goodness and efficiently locate story boundaries in polynomial time. Experimental results show that our MLC approach outperforms several state-of-the-art lexical methods.

源语言英语
主期刊名Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
出版商International Speech Communication Association
1301-1304
页数4
出版状态已出版 - 2010

出版系列

姓名Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

指纹

探究 'Maximum lexical cohesion for fine-grained news story segmentation' 的科研主题。它们共同构成独一无二的指纹。

引用此