The NNI Vietnamese speech recognition system for MediaEval 2016

Lei Wang; Chongjia Ni; Cheung Chi Leung; Changhuai You; Lei Xie; Haihua Xu; Xiong Xiao; Tin Lay Nwe; Eng Siong Chng; Bin Ma; Haizhou Li

The NNI Vietnamese speech recognition system for MediaEval 2016

Lei Wang, Chongjia Ni, Cheung Chi Leung, Changhuai You, Lei Xie, Haihua Xu, Xiong Xiao, Tin Lay Nwe, Eng Siong Chng, Bin Ma, Haizhou Li

计算机学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

摘要

This paper provides an overall description of the Vietnamese speech recognition system developed by the joint team for MediaEval 2016. The submitted system consisted of 3 sub-systems, and adopted different deep neural network-based techniques such as fMLLR transformed bottleneck features, sequence training, etc. Besides the acoustic modeling techniques, speech data augmentation was also examined to develop a more robust acoustic model. The I²R team collected a number of text resources from the Internet and made them available to other participants in the task. The web text crawled from the Internet was used to train a 5-gram language model. The submitted system obtained the token error rate (TER) of 15.1, 23.0 and 50.5 on Devel local set, Devel set and Test set, respectively.

源语言	英语
期刊	CEUR Workshop Proceedings
卷	1739
出版状态	已出版 - 2016
活动	2016 Multimedia Benchmark Workshop, MediaEval 2016 - Hilversum, 荷兰期限: 20 10月 2016 → 21 10月 2016

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{ca5e010c16ee486bb75152ada248b148,

title = "The NNI Vietnamese speech recognition system for MediaEval 2016",

abstract = "This paper provides an overall description of the Vietnamese speech recognition system developed by the joint team for MediaEval 2016. The submitted system consisted of 3 sub-systems, and adopted different deep neural network-based techniques such as fMLLR transformed bottleneck features, sequence training, etc. Besides the acoustic modeling techniques, speech data augmentation was also examined to develop a more robust acoustic model. The I2R team collected a number of text resources from the Internet and made them available to other participants in the task. The web text crawled from the Internet was used to train a 5-gram language model. The submitted system obtained the token error rate (TER) of 15.1, 23.0 and 50.5 on Devel local set, Devel set and Test set, respectively.",

author = "Lei Wang and Chongjia Ni and Leung, {Cheung Chi} and Changhuai You and Lei Xie and Haihua Xu and Xiong Xiao and Nwe, {Tin Lay} and Chng, {Eng Siong} and Bin Ma and Haizhou Li",

year = "2016",

language = "英语",

volume = "1739",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

note = "2016 Multimedia Benchmark Workshop, MediaEval 2016 ; Conference date: 20-10-2016 Through 21-10-2016",

}

TY - JOUR

T1 - The NNI Vietnamese speech recognition system for MediaEval 2016

AU - Wang, Lei

AU - Ni, Chongjia

AU - Leung, Cheung Chi

AU - You, Changhuai

AU - Xie, Lei

AU - Xu, Haihua

AU - Xiao, Xiong

AU - Nwe, Tin Lay

AU - Chng, Eng Siong

AU - Ma, Bin

AU - Li, Haizhou

PY - 2016

Y1 - 2016

N2 - This paper provides an overall description of the Vietnamese speech recognition system developed by the joint team for MediaEval 2016. The submitted system consisted of 3 sub-systems, and adopted different deep neural network-based techniques such as fMLLR transformed bottleneck features, sequence training, etc. Besides the acoustic modeling techniques, speech data augmentation was also examined to develop a more robust acoustic model. The I2R team collected a number of text resources from the Internet and made them available to other participants in the task. The web text crawled from the Internet was used to train a 5-gram language model. The submitted system obtained the token error rate (TER) of 15.1, 23.0 and 50.5 on Devel local set, Devel set and Test set, respectively.

AB - This paper provides an overall description of the Vietnamese speech recognition system developed by the joint team for MediaEval 2016. The submitted system consisted of 3 sub-systems, and adopted different deep neural network-based techniques such as fMLLR transformed bottleneck features, sequence training, etc. Besides the acoustic modeling techniques, speech data augmentation was also examined to develop a more robust acoustic model. The I2R team collected a number of text resources from the Internet and made them available to other participants in the task. The web text crawled from the Internet was used to train a 5-gram language model. The submitted system obtained the token error rate (TER) of 15.1, 23.0 and 50.5 on Devel local set, Devel set and Test set, respectively.

UR - http://www.scopus.com/inward/record.url?scp=85006320091&partnerID=8YFLogxK

M3 - 会议文章

AN - SCOPUS:85006320091

SN - 1613-0073

VL - 1739

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 2016 Multimedia Benchmark Workshop, MediaEval 2016

Y2 - 20 October 2016 through 21 October 2016

ER -

The NNI Vietnamese speech recognition system for MediaEval 2016

摘要

其它文件与链接

指纹

引用此