Phoneme lattice based texttiling towards multilingual story segmentation

Xiaoxuan Wang, Lei Xie, Bin Ma, Eng Siong Chng, Haizhou Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

This paper proposes a phoneme lattice based TextTiling approach towards multilingual story segmentation. The phoneme is the smallest segmental unit in a language and the number of phonemes in a language is usually far smaller than the number of words. Furthermore, many phonemes are shared by different languages. These properties make phonemes particularly appropriate for representing multilingual speech. As phoneme recognition is far from perfect, phoneme lattices, which carry much richer statistics than the 1-best hypotheses, are adopted in this paper as the input to the TextTiling approach. The term frequencies used in traditional TextTiling are replaced by the expected counts of phoneme n-gram units calculated from phoneme lattices. Experiments on TDT2 English and Mandarin corpora show that the phoneme lattice based TextTiling outperforms the phoneme 1-best based TextTiling and word based TextTiling in broadcast news story segmentation.

Original languageEnglish
Title of host publicationProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
PublisherInternational Speech Communication Association
Pages1305-1308
Number of pages4
StatePublished - 2010

Publication series

NameProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

Keywords

  • Phoneme lattice
  • Speech processing
  • Spoken document retrieval
  • Story segmentation
  • Topic detection and tracking

Fingerprint

Dive into the research topics of 'Phoneme lattice based texttiling towards multilingual story segmentation'. Together they form a unique fingerprint.

Cite this