TY - GEN
T1 - Lexical story co-segmentation of Chinese broadcast news
AU - Feng, Wei
AU - Nie, Xuecheng
AU - Wan, Liang
AU - Xie, Lei
AU - Jiang, Jianmin
PY - 2012
Y1 - 2012
N2 - We present an unsupervised technique, namely story cosegmentation, to automatically extract the common stories on the same topic within a pair of Chinese broadcast news transcripts. Unlike classical topic tracking that usually relies on previously trained topic models, our method is purely data-driven and is able to simultaneously determine the common stories of the input texts. Specifically, we propose an iterative four-step MRF solution to the problem of story co-segmentation using lexical cues only. We first construct a sentence-level graph formulation of the input news transcripts, and initialize foreground and background labeling by lexical clustering. We then update both foreground and background models based on the current labeling. We formalize story co-segmentation as a Gibbs energy minimization problem that balances the optimal objectives of foreground/background likelihood, intra-doc coherence, and inter-doc similarity. Finally, the labeling refinement is obtained by hybrid optimization with QPBO and BP. The effectiveness of our method has been validated on real-world CCTV corpus.
AB - We present an unsupervised technique, namely story cosegmentation, to automatically extract the common stories on the same topic within a pair of Chinese broadcast news transcripts. Unlike classical topic tracking that usually relies on previously trained topic models, our method is purely data-driven and is able to simultaneously determine the common stories of the input texts. Specifically, we propose an iterative four-step MRF solution to the problem of story co-segmentation using lexical cues only. We first construct a sentence-level graph formulation of the input news transcripts, and initialize foreground and background labeling by lexical clustering. We then update both foreground and background models based on the current labeling. We formalize story co-segmentation as a Gibbs energy minimization problem that balances the optimal objectives of foreground/background likelihood, intra-doc coherence, and inter-doc similarity. Finally, the labeling refinement is obtained by hybrid optimization with QPBO and BP. The effectiveness of our method has been validated on real-world CCTV corpus.
KW - Belief propagation (BP)
KW - Foreground and background story modeling
KW - Lexical clustering
KW - MRF
KW - QPBO
KW - Story co-segmentation
UR - http://www.scopus.com/inward/record.url?scp=84878596454&partnerID=8YFLogxK
M3 - 会议稿件
AN - SCOPUS:84878596454
SN - 9781622767595
T3 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
SP - 2283
EP - 2286
BT - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
T2 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Y2 - 9 September 2012 through 13 September 2012
ER -