The NNI Vietnamese speech recognition system for MediaEval 2016

Lei Wang, Chongjia Ni, Cheung Chi Leung, Changhuai You, Lei Xie, Haihua Xu, Xiong Xiao, Tin Lay Nwe, Eng Siong Chng, Bin Ma, Haizhou Li

Research output: Contribution to journalConference articlepeer-review

Abstract

This paper provides an overall description of the Vietnamese speech recognition system developed by the joint team for MediaEval 2016. The submitted system consisted of 3 sub-systems, and adopted different deep neural network-based techniques such as fMLLR transformed bottleneck features, sequence training, etc. Besides the acoustic modeling techniques, speech data augmentation was also examined to develop a more robust acoustic model. The I2R team collected a number of text resources from the Internet and made them available to other participants in the task. The web text crawled from the Internet was used to train a 5-gram language model. The submitted system obtained the token error rate (TER) of 15.1, 23.0 and 50.5 on Devel local set, Devel set and Test set, respectively.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume1739
StatePublished - 2016
Event2016 Multimedia Benchmark Workshop, MediaEval 2016 - Hilversum, Netherlands
Duration: 20 Oct 201621 Oct 2016

Fingerprint

Dive into the research topics of 'The NNI Vietnamese speech recognition system for MediaEval 2016'. Together they form a unique fingerprint.

Cite this