Integrating acoustic and lexical features in topic segmentation of Chinese broadcast news using maximum entropy approach

Lei Xie, Yulian Yang, Zhi Qiang Liu, Wei Feng, Zihan Liu

科研成果: 书/报告/会议事项章节会议稿件同行评审

11 引用 (Scopus)

摘要

This paper studies how to integrate multi-modal features in automatic topic segmentation of Mandarin broadcast news. The multi-modal feature integration problem is formulated within the Maximum Entropy (MaxEnt) scheme for topic boundary classification by maximizing the entropy and respecting all known constraints (i.e., multiple features contributions). We particularly consider two types of features: (1) acoustic features, which reflect the editorial prosody of broadcast news, including pause duration, speaker change and speech type; and (2) lexical features extracted from speech recognition transcripts, which capture the semantic shifts of topics, including two local cohesiveness features and a new boundary indicator based on overall cohesiveness. Compared to local lexical features, the new overall cohesiveness feature maximizes the lexical cohesiveness of all topic fragments and reflects the fact that topic transitions in broadcast news are smooth and the distributional variations are subtle. Experiments show apparent performance improvement in topic segmentation of Chinese broadcast news by fusing acoustic and lexical features within the MaxEnt scheme.

源语言英语
主期刊名ICALIP 2010 - 2010 International Conference on Audio, Language and Image Processing, Proceedings
407-413
页数7
DOI
出版状态已出版 - 2010
活动2010 International Conference on Audio, Language and Image Processing, ICALIP 2010 - Shanghai, 中国
期限: 23 11月 201025 11月 2010

出版系列

姓名ICALIP 2010 - 2010 International Conference on Audio, Language and Image Processing, Proceedings

会议

会议2010 International Conference on Audio, Language and Image Processing, ICALIP 2010
国家/地区中国
Shanghai
时期23/11/1025/11/10

指纹

探究 'Integrating acoustic and lexical features in topic segmentation of Chinese broadcast news using maximum entropy approach' 的科研主题。它们共同构成独一无二的指纹。

引用此