Acoustic TextTiling for story segmentation of spoken documents

Lilei Zheng; Cheung Chi Leung; Lei Xie; Bin Ma; Haizhou Li

doi:10.1109/ICASSP.2012.6289073

Acoustic TextTiling for story segmentation of spoken documents

Lilei Zheng, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

20 Scopus citations

Abstract

We propose an acoustic TextTiling method based on segmental dynamic time warping for automatic story segmentation of spoken documents. Different from most of the existing methods using LVCSR transcripts, this method detects story boundaries directly from audio streams. In analogy to the cosine-based lexical similarity between two text blocks in a transcript, we define the acoustic similarity measure between two pseudo-sentences in an audio stream. Experiments on TDT2 Mandarin corpus show that acoustic TextTiling can achieve comparable performance to lexical TextTiling based on LVCSR transcripts. Moreover, we use MFCCs and Gaussian posteriorgrams as the acoustic representations in our experiments. Our experiments show that Gaussian posteriorgrams are more robust to perform segmentation for the stories each with multiple speakers.

Original language	English
Title of host publication	2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
Pages	5121-5124
Number of pages	4
DOIs	https://doi.org/10.1109/ICASSP.2012.6289073
State	Published - 2012
Event	2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Kyoto, Japan Duration: 25 Mar 2012 → 30 Mar 2012

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)	1520-6149

Conference

Conference	2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
Country/Territory	Japan
City	Kyoto
Period	25/03/12 → 30/03/12

Keywords

segmental dynamic time warping
spoken document processing
story segmentation
TextTiling
topic segmentation

Access to Document

10.1109/ICASSP.2012.6289073

Cite this

Zheng, L., Leung, C. C., Xie, L., Ma, B., & Li, H. (2012). Acoustic TextTiling for story segmentation of spoken documents. In 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings (pp. 5121-5124). Article 6289073 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2012.6289073

@inproceedings{d9bebe3b98ce43a2ab58dcb8bbe90774,

title = "Acoustic TextTiling for story segmentation of spoken documents",

abstract = "We propose an acoustic TextTiling method based on segmental dynamic time warping for automatic story segmentation of spoken documents. Different from most of the existing methods using LVCSR transcripts, this method detects story boundaries directly from audio streams. In analogy to the cosine-based lexical similarity between two text blocks in a transcript, we define the acoustic similarity measure between two pseudo-sentences in an audio stream. Experiments on TDT2 Mandarin corpus show that acoustic TextTiling can achieve comparable performance to lexical TextTiling based on LVCSR transcripts. Moreover, we use MFCCs and Gaussian posteriorgrams as the acoustic representations in our experiments. Our experiments show that Gaussian posteriorgrams are more robust to perform segmentation for the stories each with multiple speakers.",

keywords = "segmental dynamic time warping, spoken document processing, story segmentation, TextTiling, topic segmentation",

author = "Lilei Zheng and Leung, {Cheung Chi} and Lei Xie and Bin Ma and Haizhou Li",

year = "2012",

doi = "10.1109/ICASSP.2012.6289073",

language = "英语",

isbn = "9781467300469",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

pages = "5121--5124",

booktitle = "2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings",

note = "2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 ; Conference date: 25-03-2012 Through 30-03-2012",

}

Zheng, L, Leung, CC, Xie, L, Ma, B & Li, H 2012, Acoustic TextTiling for story segmentation of spoken documents. in 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings., 6289073, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 5121-5124, 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, Kyoto, Japan, 25/03/12. https://doi.org/10.1109/ICASSP.2012.6289073

Acoustic TextTiling for story segmentation of spoken documents. / Zheng, Lilei; Leung, Cheung Chi; Xie, Lei et al.
2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings. 2012. p. 5121-5124 6289073 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Acoustic TextTiling for story segmentation of spoken documents

AU - Zheng, Lilei

AU - Leung, Cheung Chi

AU - Xie, Lei

AU - Ma, Bin

AU - Li, Haizhou

PY - 2012

Y1 - 2012

N2 - We propose an acoustic TextTiling method based on segmental dynamic time warping for automatic story segmentation of spoken documents. Different from most of the existing methods using LVCSR transcripts, this method detects story boundaries directly from audio streams. In analogy to the cosine-based lexical similarity between two text blocks in a transcript, we define the acoustic similarity measure between two pseudo-sentences in an audio stream. Experiments on TDT2 Mandarin corpus show that acoustic TextTiling can achieve comparable performance to lexical TextTiling based on LVCSR transcripts. Moreover, we use MFCCs and Gaussian posteriorgrams as the acoustic representations in our experiments. Our experiments show that Gaussian posteriorgrams are more robust to perform segmentation for the stories each with multiple speakers.

AB - We propose an acoustic TextTiling method based on segmental dynamic time warping for automatic story segmentation of spoken documents. Different from most of the existing methods using LVCSR transcripts, this method detects story boundaries directly from audio streams. In analogy to the cosine-based lexical similarity between two text blocks in a transcript, we define the acoustic similarity measure between two pseudo-sentences in an audio stream. Experiments on TDT2 Mandarin corpus show that acoustic TextTiling can achieve comparable performance to lexical TextTiling based on LVCSR transcripts. Moreover, we use MFCCs and Gaussian posteriorgrams as the acoustic representations in our experiments. Our experiments show that Gaussian posteriorgrams are more robust to perform segmentation for the stories each with multiple speakers.

KW - segmental dynamic time warping

KW - spoken document processing

KW - story segmentation

KW - TextTiling

KW - topic segmentation

UR - http://www.scopus.com/inward/record.url?scp=84867596539&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2012.6289073

DO - 10.1109/ICASSP.2012.6289073

M3 - 会议稿件

AN - SCOPUS:84867596539

SN - 9781467300469

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 5121

EP - 5124

BT - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings

T2 - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012

Y2 - 25 March 2012 through 30 March 2012

ER -

Zheng L, Leung CC, Xie L, Ma B, Li H. Acoustic TextTiling for story segmentation of spoken documents. In 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings. 2012. p. 5121-5124. 6289073. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2012.6289073

Acoustic TextTiling for story segmentation of spoken documents

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this