Abstract
This paper provides an overall description of the Vietnamese speech recognition system developed by the joint team for MediaEval 2016. The submitted system consisted of 3 sub-systems, and adopted different deep neural network-based techniques such as fMLLR transformed bottleneck features, sequence training, etc. Besides the acoustic modeling techniques, speech data augmentation was also examined to develop a more robust acoustic model. The I2R team collected a number of text resources from the Internet and made them available to other participants in the task. The web text crawled from the Internet was used to train a 5-gram language model. The submitted system obtained the token error rate (TER) of 15.1, 23.0 and 50.5 on Devel local set, Devel set and Test set, respectively.
| Original language | English |
|---|---|
| Journal | CEUR Workshop Proceedings |
| Volume | 1739 |
| State | Published - 2016 |
| Event | 2016 Multimedia Benchmark Workshop, MediaEval 2016 - Hilversum, Netherlands Duration: 20 Oct 2016 → 21 Oct 2016 |
Fingerprint
Dive into the research topics of 'The NNI Vietnamese speech recognition system for MediaEval 2016'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver