Acoustic TextTiling for story segmentation of spoken documents

Lilei Zheng, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

20 Scopus citations

Abstract

We propose an acoustic TextTiling method based on segmental dynamic time warping for automatic story segmentation of spoken documents. Different from most of the existing methods using LVCSR transcripts, this method detects story boundaries directly from audio streams. In analogy to the cosine-based lexical similarity between two text blocks in a transcript, we define the acoustic similarity measure between two pseudo-sentences in an audio stream. Experiments on TDT2 Mandarin corpus show that acoustic TextTiling can achieve comparable performance to lexical TextTiling based on LVCSR transcripts. Moreover, we use MFCCs and Gaussian posteriorgrams as the acoustic representations in our experiments. Our experiments show that Gaussian posteriorgrams are more robust to perform segmentation for the stories each with multiple speakers.

Original languageEnglish
Title of host publication2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
Pages5121-5124
Number of pages4
DOIs
StatePublished - 2012
Event2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Kyoto, Japan
Duration: 25 Mar 201230 Mar 2012

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
Country/TerritoryJapan
CityKyoto
Period25/03/1230/03/12

Keywords

  • segmental dynamic time warping
  • spoken document processing
  • story segmentation
  • TextTiling
  • topic segmentation

Fingerprint

Dive into the research topics of 'Acoustic TextTiling for story segmentation of spoken documents'. Together they form a unique fingerprint.

Cite this