TY - GEN
T1 - Context-dependent deep neural networks for commercial Mandarin speech recognition applications
AU - Niu, Jianwei
AU - Xie, Lei
AU - Jia, Lei
AU - Hu, Na
PY - 2013
Y1 - 2013
N2 - Recently, context-dependent deep neural network hidden Markov models (CD-DNN-HMMs) have been successfully used in some commercial large-vocabulary English speech recognition systems. It has been proved that CD-DNN-HMMs significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs (CD-GMM-HMMs). In this paper, we report our latest progress on CD-DNN-HMMs for commercial Mandarin speech recognition applications in Baidu. Experiments demonstrate that CD-DNN-HMMs can get relative 26% word error reduction and relative 16% sentence error reduction in Baidu's short message (SMS) voice input and voice search applications, respectively, compared with state-of-the-art CD-GMM-HMMs trained using fMPE. To the best of our knowledge, this is the first time the performances of CD-DNN-HMMs are reported for commercial Mandarin speech recognition applications. We also propose a GPU on-chip speed-up training approach which can achieve a speed-up ratio of nearly two for DNN training.
AB - Recently, context-dependent deep neural network hidden Markov models (CD-DNN-HMMs) have been successfully used in some commercial large-vocabulary English speech recognition systems. It has been proved that CD-DNN-HMMs significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs (CD-GMM-HMMs). In this paper, we report our latest progress on CD-DNN-HMMs for commercial Mandarin speech recognition applications in Baidu. Experiments demonstrate that CD-DNN-HMMs can get relative 26% word error reduction and relative 16% sentence error reduction in Baidu's short message (SMS) voice input and voice search applications, respectively, compared with state-of-the-art CD-GMM-HMMs trained using fMPE. To the best of our knowledge, this is the first time the performances of CD-DNN-HMMs are reported for commercial Mandarin speech recognition applications. We also propose a GPU on-chip speed-up training approach which can achieve a speed-up ratio of nearly two for DNN training.
UR - http://www.scopus.com/inward/record.url?scp=84893329649&partnerID=8YFLogxK
U2 - 10.1109/APSIPA.2013.6694268
DO - 10.1109/APSIPA.2013.6694268
M3 - 会议稿件
AN - SCOPUS:84893329649
SN - 9789869000604
T3 - 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013
BT - 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013
T2 - 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013
Y2 - 29 October 2013 through 1 November 2013
ER -