TY - GEN
T1 - Laplacian eigenmaps for automatic news story segmentation
AU - Liu, Zihan
AU - Xie, Lei
AU - Zheng, Lilei
PY - 2010
Y1 - 2010
N2 - This paper presents a novel lexical-similarity-based approach to automatic story segmentation in broadcast news. When measuring the connection between a pair of sentences, we take two factors into consideration, i.e. the lexical similarity and the distance between them in the text stream. Further investigation of pairwise connections between sentences is based on the technique of Laplacian Eigenmaps (LE). Taking advantage of the LE algorithm, we construct a Euclidean space in which each sentence is mapped to a vector. The original connective strength between sentences is reflected by the Euclidean distances between the corresponding vectors in the target space of the map. Further analysis of the map leads to a straightforward criterion for optimal segmentation. Then we formalize story segmentation as a minimization problem and give a dynamic programming solution to it. Experimental results on the TDT2 corpus show that the proposed method outperforms several state-of-the-art lexical-similarity-based methods.
AB - This paper presents a novel lexical-similarity-based approach to automatic story segmentation in broadcast news. When measuring the connection between a pair of sentences, we take two factors into consideration, i.e. the lexical similarity and the distance between them in the text stream. Further investigation of pairwise connections between sentences is based on the technique of Laplacian Eigenmaps (LE). Taking advantage of the LE algorithm, we construct a Euclidean space in which each sentence is mapped to a vector. The original connective strength between sentences is reflected by the Euclidean distances between the corresponding vectors in the target space of the map. Further analysis of the map leads to a straightforward criterion for optimal segmentation. Then we formalize story segmentation as a minimization problem and give a dynamic programming solution to it. Experimental results on the TDT2 corpus show that the proposed method outperforms several state-of-the-art lexical-similarity-based methods.
UR - http://www.scopus.com/inward/record.url?scp=79851506994&partnerID=8YFLogxK
U2 - 10.1109/ICALIP.2010.5684548
DO - 10.1109/ICALIP.2010.5684548
M3 - 会议稿件
AN - SCOPUS:79851506994
SN - 9781424458653
T3 - ICALIP 2010 - 2010 International Conference on Audio, Language and Image Processing, Proceedings
SP - 419
EP - 424
BT - ICALIP 2010 - 2010 International Conference on Audio, Language and Image Processing, Proceedings
T2 - 2010 International Conference on Audio, Language and Image Processing, ICALIP 2010
Y2 - 23 November 2010 through 25 November 2010
ER -