Broadcast news story segmentation using latent topics on data manifold

Xiaoming Lu, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li

科研成果: 书/报告/会议事项章节会议稿件同行评审

14 引用 (Scopus)

摘要

This paper proposes to use Laplacian Probabilistic Latent Semantic Analysis (LapPLSA) for broadcast news story segmentation. The latent topic distributions estimated by LapPLSA are used to replace term frequency vector as the representation of sentences and measure the cohesive strength between the sentences. Subword n-gram is used as the basic term unit in the computation. Dynamic Programming is used for story boundary detection. LapPLSA projects the data into a low-dimensional semantic topic representation while preserving the intrinsic local geometric structure of the data. The locality preserving property attempts to make the estimated latent topic distributions more robust to the noise from automatic speech recognition errors. Experiments are conducted on the ASR transcripts of TDT2 Mandarin broadcast news corpus. Our proposed approach is compared with other approaches which use dimensionality reduction technique with the locality preserving property, and two different topic modeling techniques. Experiment results show that our proposed approach provides the highest F1-measure of 0.8228, which significantly outperforms the best previous approaches.

源语言英语
主期刊名2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
8465-8469
页数5
DOI
出版状态已出版 - 18 10月 2013
活动2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, 加拿大
期限: 26 5月 201331 5月 2013

出版系列

姓名ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN(印刷版)1520-6149

会议

会议2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
国家/地区加拿大
Vancouver, BC
时期26/05/1331/05/13

指纹

探究 'Broadcast news story segmentation using latent topics on data manifold' 的科研主题。它们共同构成独一无二的指纹。

引用此