Acoustic TextTiling for story segmentation of spoken documents

Lilei Zheng; Cheung Chi Leung; Lei Xie; Bin Ma; Haizhou Li

doi:10.1109/ICASSP.2012.6289073

Acoustic TextTiling for story segmentation of spoken documents

Lilei Zheng, Cheung Chi Leung, Lei Xie, Bin Ma, Haizhou Li

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

20 引用（Scopus）

摘要

We propose an acoustic TextTiling method based on segmental dynamic time warping for automatic story segmentation of spoken documents. Different from most of the existing methods using LVCSR transcripts, this method detects story boundaries directly from audio streams. In analogy to the cosine-based lexical similarity between two text blocks in a transcript, we define the acoustic similarity measure between two pseudo-sentences in an audio stream. Experiments on TDT2 Mandarin corpus show that acoustic TextTiling can achieve comparable performance to lexical TextTiling based on LVCSR transcripts. Moreover, we use MFCCs and Gaussian posteriorgrams as the acoustic representations in our experiments. Our experiments show that Gaussian posteriorgrams are more robust to perform segmentation for the stories each with multiple speakers.

源语言	英语
主期刊名	2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
页	5121-5124
页数	4
DOI	https://doi.org/10.1109/ICASSP.2012.6289073
出版状态	已出版 - 2012
活动	2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Kyoto, 日本期限: 25 3月 2012 → 30 3月 2012

出版系列

姓名	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN（印刷版）	1520-6149

会议

会议	2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
国家/地区	日本
市	Kyoto
时期	25/03/12 → 30/03/12

访问文件

10.1109/ICASSP.2012.6289073

其它文件与链接

链接到 Scopus 的出版物

引用此

Zheng, L., Leung, C. C., Xie, L., Ma, B., & Li, H. (2012). Acoustic TextTiling for story segmentation of spoken documents. 在 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings (页码 5121-5124). 文章 6289073 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2012.6289073

@inproceedings{d9bebe3b98ce43a2ab58dcb8bbe90774,

title = "Acoustic TextTiling for story segmentation of spoken documents",

abstract = "We propose an acoustic TextTiling method based on segmental dynamic time warping for automatic story segmentation of spoken documents. Different from most of the existing methods using LVCSR transcripts, this method detects story boundaries directly from audio streams. In analogy to the cosine-based lexical similarity between two text blocks in a transcript, we define the acoustic similarity measure between two pseudo-sentences in an audio stream. Experiments on TDT2 Mandarin corpus show that acoustic TextTiling can achieve comparable performance to lexical TextTiling based on LVCSR transcripts. Moreover, we use MFCCs and Gaussian posteriorgrams as the acoustic representations in our experiments. Our experiments show that Gaussian posteriorgrams are more robust to perform segmentation for the stories each with multiple speakers.",

keywords = "segmental dynamic time warping, spoken document processing, story segmentation, TextTiling, topic segmentation",

author = "Lilei Zheng and Leung, {Cheung Chi} and Lei Xie and Bin Ma and Haizhou Li",

year = "2012",

doi = "10.1109/ICASSP.2012.6289073",

language = "英语",

isbn = "9781467300469",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

pages = "5121--5124",

booktitle = "2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings",

note = "2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 ; Conference date: 25-03-2012 Through 30-03-2012",

}

Zheng, L, Leung, CC, Xie, L, Ma, B & Li, H 2012, Acoustic TextTiling for story segmentation of spoken documents. 在 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings., 6289073, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 页码 5121-5124, 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, Kyoto, 日本, 25/03/12. https://doi.org/10.1109/ICASSP.2012.6289073

Acoustic TextTiling for story segmentation of spoken documents. / Zheng, Lilei; Leung, Cheung Chi; Xie, Lei 等.
2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings. 2012. 页码 5121-5124 6289073 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Acoustic TextTiling for story segmentation of spoken documents

AU - Zheng, Lilei

AU - Leung, Cheung Chi

AU - Xie, Lei

AU - Ma, Bin

AU - Li, Haizhou

PY - 2012

Y1 - 2012

N2 - We propose an acoustic TextTiling method based on segmental dynamic time warping for automatic story segmentation of spoken documents. Different from most of the existing methods using LVCSR transcripts, this method detects story boundaries directly from audio streams. In analogy to the cosine-based lexical similarity between two text blocks in a transcript, we define the acoustic similarity measure between two pseudo-sentences in an audio stream. Experiments on TDT2 Mandarin corpus show that acoustic TextTiling can achieve comparable performance to lexical TextTiling based on LVCSR transcripts. Moreover, we use MFCCs and Gaussian posteriorgrams as the acoustic representations in our experiments. Our experiments show that Gaussian posteriorgrams are more robust to perform segmentation for the stories each with multiple speakers.

AB - We propose an acoustic TextTiling method based on segmental dynamic time warping for automatic story segmentation of spoken documents. Different from most of the existing methods using LVCSR transcripts, this method detects story boundaries directly from audio streams. In analogy to the cosine-based lexical similarity between two text blocks in a transcript, we define the acoustic similarity measure between two pseudo-sentences in an audio stream. Experiments on TDT2 Mandarin corpus show that acoustic TextTiling can achieve comparable performance to lexical TextTiling based on LVCSR transcripts. Moreover, we use MFCCs and Gaussian posteriorgrams as the acoustic representations in our experiments. Our experiments show that Gaussian posteriorgrams are more robust to perform segmentation for the stories each with multiple speakers.

KW - segmental dynamic time warping

KW - spoken document processing

KW - story segmentation

KW - TextTiling

KW - topic segmentation

UR - http://www.scopus.com/inward/record.url?scp=84867596539&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2012.6289073

DO - 10.1109/ICASSP.2012.6289073

M3 - 会议稿件

AN - SCOPUS:84867596539

SN - 9781467300469

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 5121

EP - 5124

BT - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings

T2 - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012

Y2 - 25 March 2012 through 30 March 2012

ER -

Zheng L, Leung CC, Xie L, Ma B, Li H. Acoustic TextTiling for story segmentation of spoken documents. 在 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings. 2012. 页码 5121-5124. 6289073. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2012.6289073

Acoustic TextTiling for story segmentation of spoken documents

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此