TY - GEN
T1 - Sentence boundary detection in chinese broadcast news using conditional random fields and prosodic features
AU - Xu, Chenglin
AU - Xie, Lei
AU - Fu, Zhonghua
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/9/3
Y1 - 2014/9/3
N2 - This paper studies the use of condition random fields (CRF) and prosodic features for sentence boundary detection in Chinese broadcast news. Previous approaches mostly use first-order CRF and ignore the important context and sequential information. In this paper, we explore high-order CRF models to fully make use of the contextual and sequential information. Moreover, we show the effectiveness of CRF in sentence boundary detection by comparing it with various competitive models. The prosodic feature set is usually designed to be as exhaustive as possible in previous approaches. As a result, features may be highly correlated and some of them may be not effective. In this paper, we use a correlation-based feature selection method to select a subset with the most useful features. Finally, the use of the prosodic features, e.g., pitch, in Chinese sentence segmentation deserves further investigation because the tonal aspect of Chinese may complicate the expressions of pitch features. In this paper, we study the effectiveness of the prosodic features and rank their importance by an analysis of feature usage.
AB - This paper studies the use of condition random fields (CRF) and prosodic features for sentence boundary detection in Chinese broadcast news. Previous approaches mostly use first-order CRF and ignore the important context and sequential information. In this paper, we explore high-order CRF models to fully make use of the contextual and sequential information. Moreover, we show the effectiveness of CRF in sentence boundary detection by comparing it with various competitive models. The prosodic feature set is usually designed to be as exhaustive as possible in previous approaches. As a result, features may be highly correlated and some of them may be not effective. In this paper, we use a correlation-based feature selection method to select a subset with the most useful features. Finally, the use of the prosodic features, e.g., pitch, in Chinese sentence segmentation deserves further investigation because the tonal aspect of Chinese may complicate the expressions of pitch features. In this paper, we study the effectiveness of the prosodic features and rank their importance by an analysis of feature usage.
KW - conditional random field
KW - feature selection
KW - sentence boundary detection
KW - sentence segmentation
KW - speech prosody
UR - http://www.scopus.com/inward/record.url?scp=84929412307&partnerID=8YFLogxK
U2 - 10.1109/ChinaSIP.2014.6889197
DO - 10.1109/ChinaSIP.2014.6889197
M3 - 会议稿件
AN - SCOPUS:84929412307
T3 - 2014 IEEE China Summit and International Conference on Signal and Information Processing, IEEE ChinaSIP 2014 - Proceedings
SP - 37
EP - 41
BT - 2014 IEEE China Summit and International Conference on Signal and Information Processing, IEEE ChinaSIP 2014 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd IEEE China Summit and International Conference on Signal and Information Processing, IEEE ChinaSIP 2014
Y2 - 9 July 2014 through 13 July 2014
ER -