Maximum lexical cohesion for fine-grained news story segmentation

Zihan Liu; Lei Xie; Wei Feng

Maximum lexical cohesion for fine-grained news story segmentation

Zihan Liu, Lei Xie, Wei Feng

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

5 Scopus citations

Abstract

We propose a maximum lexical cohesion (MLC) approach to news story segmentation. Unlike sentence-dependent lexical methods, our approach is able to detect story boundaries at finer word/subword granularity, and thus is more suitable for speech recognition transcripts which have no sentence delimiters. The proposed segmentation goodness measure takes account of both lexical cohesion and a prior preference of story length. We measure the lexical cohesion of a segment by the KL-divergence from its word distribution to an associated piecewise uniform distribution. Taking account of the uneven contributions of different words to a story, the cohesion measure is further refined by two word weighting schemes, i.e. the inverse document frequency (IDF) and a new weighting method called difference from expectation (DFE). We then propose a dynamic programming solution to exactly maximize the segmentation goodness and efficiently locate story boundaries in polynomial time. Experimental results show that our MLC approach outperforms several state-of-the-art lexical methods.

Original language	English
Title of host publication	Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
Publisher	International Speech Communication Association
Pages	1301-1304
Number of pages	4
State	Published - 2010

Publication series

Name	Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

Keywords

Dynamic programming
KL-divergence
Lexical cohesion
Spoken document retrieval
Spoken document segmentation
Story segmentation
Word weighting

Cite this

Liu, Z., Xie, L., & Feng, W. (2010). Maximum lexical cohesion for fine-grained news story segmentation. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 (pp. 1301-1304). (Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010). International Speech Communication Association.

Liu, Zihan ; Xie, Lei ; Feng, Wei. / Maximum lexical cohesion for fine-grained news story segmentation. Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. International Speech Communication Association, 2010. pp. 1301-1304 (Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010).

@inproceedings{b167a7859ff84bef82c8b59141bf266f,

title = "Maximum lexical cohesion for fine-grained news story segmentation",

abstract = "We propose a maximum lexical cohesion (MLC) approach to news story segmentation. Unlike sentence-dependent lexical methods, our approach is able to detect story boundaries at finer word/subword granularity, and thus is more suitable for speech recognition transcripts which have no sentence delimiters. The proposed segmentation goodness measure takes account of both lexical cohesion and a prior preference of story length. We measure the lexical cohesion of a segment by the KL-divergence from its word distribution to an associated piecewise uniform distribution. Taking account of the uneven contributions of different words to a story, the cohesion measure is further refined by two word weighting schemes, i.e. the inverse document frequency (IDF) and a new weighting method called difference from expectation (DFE). We then propose a dynamic programming solution to exactly maximize the segmentation goodness and efficiently locate story boundaries in polynomial time. Experimental results show that our MLC approach outperforms several state-of-the-art lexical methods.",

keywords = "Dynamic programming, KL-divergence, Lexical cohesion, Spoken document retrieval, Spoken document segmentation, Story segmentation, Word weighting",

author = "Zihan Liu and Lei Xie and Wei Feng",

year = "2010",

language = "英语",

series = "Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010",

publisher = "International Speech Communication Association",

pages = "1301--1304",

booktitle = "Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010",

}

Liu, Z, Xie, L & Feng, W 2010, Maximum lexical cohesion for fine-grained news story segmentation. in Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, International Speech Communication Association, pp. 1301-1304.

Maximum lexical cohesion for fine-grained news story segmentation. / Liu, Zihan; Xie, Lei; Feng, Wei.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. International Speech Communication Association, 2010. p. 1301-1304 (Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Maximum lexical cohesion for fine-grained news story segmentation

AU - Liu, Zihan

AU - Xie, Lei

AU - Feng, Wei

PY - 2010

Y1 - 2010

N2 - We propose a maximum lexical cohesion (MLC) approach to news story segmentation. Unlike sentence-dependent lexical methods, our approach is able to detect story boundaries at finer word/subword granularity, and thus is more suitable for speech recognition transcripts which have no sentence delimiters. The proposed segmentation goodness measure takes account of both lexical cohesion and a prior preference of story length. We measure the lexical cohesion of a segment by the KL-divergence from its word distribution to an associated piecewise uniform distribution. Taking account of the uneven contributions of different words to a story, the cohesion measure is further refined by two word weighting schemes, i.e. the inverse document frequency (IDF) and a new weighting method called difference from expectation (DFE). We then propose a dynamic programming solution to exactly maximize the segmentation goodness and efficiently locate story boundaries in polynomial time. Experimental results show that our MLC approach outperforms several state-of-the-art lexical methods.

AB - We propose a maximum lexical cohesion (MLC) approach to news story segmentation. Unlike sentence-dependent lexical methods, our approach is able to detect story boundaries at finer word/subword granularity, and thus is more suitable for speech recognition transcripts which have no sentence delimiters. The proposed segmentation goodness measure takes account of both lexical cohesion and a prior preference of story length. We measure the lexical cohesion of a segment by the KL-divergence from its word distribution to an associated piecewise uniform distribution. Taking account of the uneven contributions of different words to a story, the cohesion measure is further refined by two word weighting schemes, i.e. the inverse document frequency (IDF) and a new weighting method called difference from expectation (DFE). We then propose a dynamic programming solution to exactly maximize the segmentation goodness and efficiently locate story boundaries in polynomial time. Experimental results show that our MLC approach outperforms several state-of-the-art lexical methods.

KW - Dynamic programming

KW - KL-divergence

KW - Lexical cohesion

KW - Spoken document retrieval

KW - Spoken document segmentation

KW - Story segmentation

KW - Word weighting

UR - http://www.scopus.com/inward/record.url?scp=79959851418&partnerID=8YFLogxK

M3 - 会议稿件

AN - SCOPUS:79959851418

T3 - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

SP - 1301

EP - 1304

BT - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

PB - International Speech Communication Association

ER -

Liu Z, Xie L, Feng W. Maximum lexical cohesion for fine-grained news story segmentation. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. International Speech Communication Association. 2010. p. 1301-1304. (Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010).

Maximum lexical cohesion for fine-grained news story segmentation

Abstract

Publication series

Keywords

Other files and links

Fingerprint

Cite this