Broadcast news story segmentation using latent topics on data manifold

Xiaoming Lu, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

This paper proposes to use Laplacian Probabilistic Latent Semantic Analysis (LapPLSA) for broadcast news story segmentation. The latent topic distributions estimated by LapPLSA are used to replace term frequency vector as the representation of sentences and measure the cohesive strength between the sentences. Subword n-gram is used as the basic term unit in the computation. Dynamic Programming is used for story boundary detection. LapPLSA projects the data into a low-dimensional semantic topic representation while preserving the intrinsic local geometric structure of the data. The locality preserving property attempts to make the estimated latent topic distributions more robust to the noise from automatic speech recognition errors. Experiments are conducted on the ASR transcripts of TDT2 Mandarin broadcast news corpus. Our proposed approach is compared with other approaches which use dimensionality reduction technique with the locality preserving property, and two different topic modeling techniques. Experiment results show that our proposed approach provides the highest F1-measure of 0.8228, which significantly outperforms the best previous approaches.

Original languageEnglish
Title of host publication2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
Pages8465-8469
Number of pages5
DOIs
StatePublished - 18 Oct 2013
Event2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, Canada
Duration: 26 May 201331 May 2013

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
Country/TerritoryCanada
CityVancouver, BC
Period26/05/1331/05/13

Keywords

  • dimensionality reduction
  • laplacian probabilistic latent semantic analysis
  • story segmentation
  • topic modeling

Fingerprint

Dive into the research topics of 'Broadcast news story segmentation using latent topics on data manifold'. Together they form a unique fingerprint.

Cite this