Acoustic TextTiling for story segmentation of spoken documents

Lilei Zheng, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li

科研成果: 书/报告/会议事项章节会议稿件同行评审

20 引用 (Scopus)

摘要

We propose an acoustic TextTiling method based on segmental dynamic time warping for automatic story segmentation of spoken documents. Different from most of the existing methods using LVCSR transcripts, this method detects story boundaries directly from audio streams. In analogy to the cosine-based lexical similarity between two text blocks in a transcript, we define the acoustic similarity measure between two pseudo-sentences in an audio stream. Experiments on TDT2 Mandarin corpus show that acoustic TextTiling can achieve comparable performance to lexical TextTiling based on LVCSR transcripts. Moreover, we use MFCCs and Gaussian posteriorgrams as the acoustic representations in our experiments. Our experiments show that Gaussian posteriorgrams are more robust to perform segmentation for the stories each with multiple speakers.

源语言英语
主期刊名2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
5121-5124
页数4
DOI
出版状态已出版 - 2012
活动2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Kyoto, 日本
期限: 25 3月 201230 3月 2012

出版系列

姓名ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN(印刷版)1520-6149

会议

会议2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
国家/地区日本
Kyoto
时期25/03/1230/03/12

指纹

探究 'Acoustic TextTiling for story segmentation of spoken documents' 的科研主题。它们共同构成独一无二的指纹。

引用此