Abstract
This paper presents a deep neural network (DNN) approach to sentence boundary detection in broadcast news. We extract prosodic and lexical features at each inter-word position in the transcripts and learn a sequential classifier to label these positions as either boundary or non-boundary. This work is realized by a hybrid DNN-CRF (conditional random field) architecture. The DNN accepts prosodic feature inputs and non-linearly maps them into boundary/non-boundary posterior probability outputs. Subsequently, the posterior probabilities are combined with lexical features and the integrated features are modeled by a linear-chain CRF. The CRF finally labels the inter-word positions as boundary or non-boundary by Viterbi decoding. Experiments show that, as compared with the state-of-the-art DTCRF approach [1], the proposed DNN-CRF approach achieves 16.7% and 4.1% reduction in NIST boundary detection error in reference and speech recognition transcripts, respectively.
Original language | English |
---|---|
Pages (from-to) | 2887-2891 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
State | Published - 2014 |
Event | 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore Duration: 14 Sep 2014 → 18 Sep 2014 |
Keywords
- Deep neural network
- Rich transcription
- Sentence boundary detection
- Structural event detection