TY - JOUR
T1 - A deep neural network approach for sentence boundary detection in broadcast news
AU - Xu, Chenglin
AU - Xie, Lei
AU - Huang, Guangpu
AU - Xiao, Xiong
AU - Chng, Eng Siong
AU - Li, Haizhou
N1 - Publisher Copyright:
Copyright © 2014 ISCA.
PY - 2014
Y1 - 2014
N2 - This paper presents a deep neural network (DNN) approach to sentence boundary detection in broadcast news. We extract prosodic and lexical features at each inter-word position in the transcripts and learn a sequential classifier to label these positions as either boundary or non-boundary. This work is realized by a hybrid DNN-CRF (conditional random field) architecture. The DNN accepts prosodic feature inputs and non-linearly maps them into boundary/non-boundary posterior probability outputs. Subsequently, the posterior probabilities are combined with lexical features and the integrated features are modeled by a linear-chain CRF. The CRF finally labels the inter-word positions as boundary or non-boundary by Viterbi decoding. Experiments show that, as compared with the state-of-the-art DTCRF approach [1], the proposed DNN-CRF approach achieves 16.7% and 4.1% reduction in NIST boundary detection error in reference and speech recognition transcripts, respectively.
AB - This paper presents a deep neural network (DNN) approach to sentence boundary detection in broadcast news. We extract prosodic and lexical features at each inter-word position in the transcripts and learn a sequential classifier to label these positions as either boundary or non-boundary. This work is realized by a hybrid DNN-CRF (conditional random field) architecture. The DNN accepts prosodic feature inputs and non-linearly maps them into boundary/non-boundary posterior probability outputs. Subsequently, the posterior probabilities are combined with lexical features and the integrated features are modeled by a linear-chain CRF. The CRF finally labels the inter-word positions as boundary or non-boundary by Viterbi decoding. Experiments show that, as compared with the state-of-the-art DTCRF approach [1], the proposed DNN-CRF approach achieves 16.7% and 4.1% reduction in NIST boundary detection error in reference and speech recognition transcripts, respectively.
KW - Deep neural network
KW - Rich transcription
KW - Sentence boundary detection
KW - Structural event detection
UR - http://www.scopus.com/inward/record.url?scp=84910051184&partnerID=8YFLogxK
M3 - 会议文章
AN - SCOPUS:84910051184
SN - 2308-457X
SP - 2887
EP - 2891
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014
Y2 - 14 September 2014 through 18 September 2014
ER -