Modeling the statistical behavior of lexical chains to capture word cohesiveness for automatic story segmentation

Shing Kai Chan, Lei Xie, Helen Mei Ling Meng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

We present a mathematically rigorous framework for modeling the statistical behavior of lexical chains for automatic story segmentation of broadcast news audio. Lexical chains were first proposed in [1] to connect related terms within a story, as an embodiment of lexical cohesion. The vocabulary within a story tends to be cohesive, while a change in the vocabulary distribution tends to signify a topic shift that occurs across a story boundary. Previous work focused on the concept and nature of lexical chains but performed story segmentation based on arbitrary thresholding. This work proposes the use of the log-normal distribution to capture the statistical behavior of lexical chains, together with data-driven parameter selection for lexical chain formation. Experimentation based on the TDT-2 Mandarin Corpus shows that the proposed statistical model leads to better story segmentation, where the F1-measure increased from 0.468 to 0.641.

Original languageEnglish
Title of host publicationInternational Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Pages2408-2411
Number of pages4
StatePublished - 2007
Externally publishedYes
Event8th Annual Conference of the International Speech Communication Association, Interspeech 2007 - Antwerp, Belgium
Duration: 27 Aug 200731 Aug 2007

Publication series

NameInternational Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Volume4

Conference

Conference8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Country/TerritoryBelgium
CityAntwerp
Period27/08/0731/08/07

Keywords

  • Chinese
  • Spoken document retrieval
  • Story segmentation

Fingerprint

Dive into the research topics of 'Modeling the statistical behavior of lexical chains to capture word cohesiveness for automatic story segmentation'. Together they form a unique fingerprint.

Cite this