Subword lexical chaining for automatic story segmentation in Chinese broadcast news

Lei Xie, Yulian Yang, Jia Zeng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

We present a subword lexical chaining approach to automatic story segmentation of Chinese broadcast news (BN). Conventional lexical chains link related words with cohesion (e.g. repetition of words) and high concentration points of starting and ending chains are indicative of story boundaries. However, inevitable speech recognition errors in BN transcripts may destroy the cohesiveness of words, resulting in word match failures. We show the robustness of Chinese subwords (characters and syllables) in lexical matching in errorful ASR transcripts. This motivates us to discover story boundaries on chains formed by character and syllable n-gram units. Experimental results on the TDT2 Mandarin corpus show that chaining by character unigram exhibits the best story segmentation performance with relative F-measure improvement of 6.06% over conventional word chaining. Integrations of multi-scales (words and subwords) exhibit further improvement. For example, fusion by voting from different scales achieves an F-measure gain of 9.04% over words.

Original languageEnglish
Title of host publicationAdvances in Multimedia Information Processing - PCM 2008 - 9th Pacific Rim Conference on Multimedia, Proceedings
Pages248-258
Number of pages11
DOIs
StatePublished - 2008
Event9th Pacific Rim Conference on Multimedia, PCM 2008 - Tainan, Taiwan, Province of China
Duration: 9 Dec 200813 Dec 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5353 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th Pacific Rim Conference on Multimedia, PCM 2008
Country/TerritoryTaiwan, Province of China
CityTainan
Period9/12/0813/12/08

Keywords

  • Chinese
  • Multimedia
  • Spoken document retrieval
  • Story segmentation
  • Topic segmentation

Fingerprint

Dive into the research topics of 'Subword lexical chaining for automatic story segmentation in Chinese broadcast news'. Together they form a unique fingerprint.

Cite this