The NNI Vietnamese speech recognition system for MediaEval 2016

Lei Wang; Chongjia Ni; Cheung Chi Leung; Changhuai You; Lei Xie; Haihua Xu; Xiong Xiao; Tin Lay Nwe; Eng Siong Chng; Bin Ma; Haizhou Li

The NNI Vietnamese speech recognition system for MediaEval 2016

Lei Wang, Chongjia Ni, Cheung Chi Leung, Changhuai You, Lei Xie, Haihua Xu, Xiong Xiao, Tin Lay Nwe, Eng Siong Chng, Bin Ma, Haizhou Li

School of Computer Science

Research output: Contribution to journal › Conference article › peer-review

Abstract

This paper provides an overall description of the Vietnamese speech recognition system developed by the joint team for MediaEval 2016. The submitted system consisted of 3 sub-systems, and adopted different deep neural network-based techniques such as fMLLR transformed bottleneck features, sequence training, etc. Besides the acoustic modeling techniques, speech data augmentation was also examined to develop a more robust acoustic model. The I²R team collected a number of text resources from the Internet and made them available to other participants in the task. The web text crawled from the Internet was used to train a 5-gram language model. The submitted system obtained the token error rate (TER) of 15.1, 23.0 and 50.5 on Devel local set, Devel set and Test set, respectively.

Original language	English
Journal	CEUR Workshop Proceedings
Volume	1739
State	Published - 2016
Event	2016 Multimedia Benchmark Workshop, MediaEval 2016 - Hilversum, Netherlands Duration: 20 Oct 2016 → 21 Oct 2016

Cite this

@article{ca5e010c16ee486bb75152ada248b148,

title = "The NNI Vietnamese speech recognition system for MediaEval 2016",

abstract = "This paper provides an overall description of the Vietnamese speech recognition system developed by the joint team for MediaEval 2016. The submitted system consisted of 3 sub-systems, and adopted different deep neural network-based techniques such as fMLLR transformed bottleneck features, sequence training, etc. Besides the acoustic modeling techniques, speech data augmentation was also examined to develop a more robust acoustic model. The I2R team collected a number of text resources from the Internet and made them available to other participants in the task. The web text crawled from the Internet was used to train a 5-gram language model. The submitted system obtained the token error rate (TER) of 15.1, 23.0 and 50.5 on Devel local set, Devel set and Test set, respectively.",

author = "Lei Wang and Chongjia Ni and Leung, {Cheung Chi} and Changhuai You and Lei Xie and Haihua Xu and Xiong Xiao and Nwe, {Tin Lay} and Chng, {Eng Siong} and Bin Ma and Haizhou Li",

year = "2016",

language = "英语",

volume = "1739",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

note = "2016 Multimedia Benchmark Workshop, MediaEval 2016 ; Conference date: 20-10-2016 Through 21-10-2016",

}

TY - JOUR

T1 - The NNI Vietnamese speech recognition system for MediaEval 2016

AU - Wang, Lei

AU - Ni, Chongjia

AU - Leung, Cheung Chi

AU - You, Changhuai

AU - Xie, Lei

AU - Xu, Haihua

AU - Xiao, Xiong

AU - Nwe, Tin Lay

AU - Chng, Eng Siong

AU - Ma, Bin

AU - Li, Haizhou

PY - 2016

Y1 - 2016

N2 - This paper provides an overall description of the Vietnamese speech recognition system developed by the joint team for MediaEval 2016. The submitted system consisted of 3 sub-systems, and adopted different deep neural network-based techniques such as fMLLR transformed bottleneck features, sequence training, etc. Besides the acoustic modeling techniques, speech data augmentation was also examined to develop a more robust acoustic model. The I2R team collected a number of text resources from the Internet and made them available to other participants in the task. The web text crawled from the Internet was used to train a 5-gram language model. The submitted system obtained the token error rate (TER) of 15.1, 23.0 and 50.5 on Devel local set, Devel set and Test set, respectively.

AB - This paper provides an overall description of the Vietnamese speech recognition system developed by the joint team for MediaEval 2016. The submitted system consisted of 3 sub-systems, and adopted different deep neural network-based techniques such as fMLLR transformed bottleneck features, sequence training, etc. Besides the acoustic modeling techniques, speech data augmentation was also examined to develop a more robust acoustic model. The I2R team collected a number of text resources from the Internet and made them available to other participants in the task. The web text crawled from the Internet was used to train a 5-gram language model. The submitted system obtained the token error rate (TER) of 15.1, 23.0 and 50.5 on Devel local set, Devel set and Test set, respectively.

UR - http://www.scopus.com/inward/record.url?scp=85006320091&partnerID=8YFLogxK

M3 - 会议文章

AN - SCOPUS:85006320091

SN - 1613-0073

VL - 1739

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 2016 Multimedia Benchmark Workshop, MediaEval 2016

Y2 - 20 October 2016 through 21 October 2016

ER -

The NNI Vietnamese speech recognition system for MediaEval 2016

Abstract

Other files and links

Fingerprint

Cite this