TY - GEN
T1 - Unsupervised broadcast news story segmentation using distance dependent Chinese restaurant processes
AU - Yang, Chao
AU - Xie, Lei
AU - Zhou, Xiangzeng
PY - 2014
Y1 - 2014
N2 - Traditional unsupervised broadcast news story segmentation approaches have to set the segmentation number manually, while this number is often unknown in real-world applications. In this paper, we solve this problem by modeling the generative process of stories as distance dependent Chinese restaurant process (dd-CRP) mixtures. We cut a news program into fixed-size text blocks and consider these blocks in the same story are generated from a story-specific topic. Specifically, we add a dd-CRP prior which has an essential bias that the blocks' topic is more likely to be the same with the nearby blocks. Subsequently, story boundaries can be found by detecting the changes of topics. Experiments show that our approach outperforms both supervised and unsupervised approaches and the segmentation number can be automatically learned from data.
AB - Traditional unsupervised broadcast news story segmentation approaches have to set the segmentation number manually, while this number is often unknown in real-world applications. In this paper, we solve this problem by modeling the generative process of stories as distance dependent Chinese restaurant process (dd-CRP) mixtures. We cut a news program into fixed-size text blocks and consider these blocks in the same story are generated from a story-specific topic. Specifically, we add a dd-CRP prior which has an essential bias that the blocks' topic is more likely to be the same with the nearby blocks. Subsequently, story boundaries can be found by detecting the changes of topics. Experiments show that our approach outperforms both supervised and unsupervised approaches and the segmentation number can be automatically learned from data.
UR - http://www.scopus.com/inward/record.url?scp=84905268752&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2014.6854365
DO - 10.1109/ICASSP.2014.6854365
M3 - 会议稿件
AN - SCOPUS:84905268752
SN - 9781479928927
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4062
EP - 4066
BT - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
Y2 - 4 May 2014 through 9 May 2014
ER -