Context-dependent deep neural networks for commercial Mandarin speech recognition applications

Jianwei Niu; Lei Xie; Lei Jia; Na Hu

doi:10.1109/APSIPA.2013.6694268

Context-dependent deep neural networks for commercial Mandarin speech recognition applications

Jianwei Niu, Lei Xie, Lei Jia, Na Hu

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

4 Scopus citations

Abstract

Recently, context-dependent deep neural network hidden Markov models (CD-DNN-HMMs) have been successfully used in some commercial large-vocabulary English speech recognition systems. It has been proved that CD-DNN-HMMs significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs (CD-GMM-HMMs). In this paper, we report our latest progress on CD-DNN-HMMs for commercial Mandarin speech recognition applications in Baidu. Experiments demonstrate that CD-DNN-HMMs can get relative 26% word error reduction and relative 16% sentence error reduction in Baidu's short message (SMS) voice input and voice search applications, respectively, compared with state-of-the-art CD-GMM-HMMs trained using fMPE. To the best of our knowledge, this is the first time the performances of CD-DNN-HMMs are reported for commercial Mandarin speech recognition applications. We also propose a GPU on-chip speed-up training approach which can achieve a speed-up ratio of nearly two for DNN training.

Original language	English
Title of host publication	2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013
DOIs	https://doi.org/10.1109/APSIPA.2013.6694268
State	Published - 2013
Event	2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013 - Kaohsiung, Taiwan, Province of China Duration: 29 Oct 2013 → 1 Nov 2013

Publication series

Name	2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013

Conference

Conference	2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013
Country/Territory	Taiwan, Province of China
City	Kaohsiung
Period	29/10/13 → 1/11/13

Access to Document

10.1109/APSIPA.2013.6694268

Cite this

Niu, J., Xie, L., Jia, L., & Hu, N. (2013). Context-dependent deep neural networks for commercial Mandarin speech recognition applications. In 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013 Article 6694268 (2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013). https://doi.org/10.1109/APSIPA.2013.6694268

@inproceedings{2de0301102384d1db684efe628a55f2f,

title = "Context-dependent deep neural networks for commercial Mandarin speech recognition applications",

abstract = "Recently, context-dependent deep neural network hidden Markov models (CD-DNN-HMMs) have been successfully used in some commercial large-vocabulary English speech recognition systems. It has been proved that CD-DNN-HMMs significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs (CD-GMM-HMMs). In this paper, we report our latest progress on CD-DNN-HMMs for commercial Mandarin speech recognition applications in Baidu. Experiments demonstrate that CD-DNN-HMMs can get relative 26% word error reduction and relative 16% sentence error reduction in Baidu's short message (SMS) voice input and voice search applications, respectively, compared with state-of-the-art CD-GMM-HMMs trained using fMPE. To the best of our knowledge, this is the first time the performances of CD-DNN-HMMs are reported for commercial Mandarin speech recognition applications. We also propose a GPU on-chip speed-up training approach which can achieve a speed-up ratio of nearly two for DNN training.",

author = "Jianwei Niu and Lei Xie and Lei Jia and Na Hu",

year = "2013",

doi = "10.1109/APSIPA.2013.6694268",

language = "英语",

isbn = "9789869000604",

series = "2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013",

booktitle = "2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013",

note = "2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013 ; Conference date: 29-10-2013 Through 01-11-2013",

}

Niu, J, Xie, L, Jia, L & Hu, N 2013, Context-dependent deep neural networks for commercial Mandarin speech recognition applications. in 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013., 6694268, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013, Kaohsiung, Taiwan, Province of China, 29/10/13. https://doi.org/10.1109/APSIPA.2013.6694268

Context-dependent deep neural networks for commercial Mandarin speech recognition applications. / Niu, Jianwei; Xie, Lei; Jia, Lei et al.
2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013. 2013. 6694268 (2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Context-dependent deep neural networks for commercial Mandarin speech recognition applications

AU - Niu, Jianwei

AU - Xie, Lei

AU - Jia, Lei

AU - Hu, Na

PY - 2013

Y1 - 2013

N2 - Recently, context-dependent deep neural network hidden Markov models (CD-DNN-HMMs) have been successfully used in some commercial large-vocabulary English speech recognition systems. It has been proved that CD-DNN-HMMs significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs (CD-GMM-HMMs). In this paper, we report our latest progress on CD-DNN-HMMs for commercial Mandarin speech recognition applications in Baidu. Experiments demonstrate that CD-DNN-HMMs can get relative 26% word error reduction and relative 16% sentence error reduction in Baidu's short message (SMS) voice input and voice search applications, respectively, compared with state-of-the-art CD-GMM-HMMs trained using fMPE. To the best of our knowledge, this is the first time the performances of CD-DNN-HMMs are reported for commercial Mandarin speech recognition applications. We also propose a GPU on-chip speed-up training approach which can achieve a speed-up ratio of nearly two for DNN training.

AB - Recently, context-dependent deep neural network hidden Markov models (CD-DNN-HMMs) have been successfully used in some commercial large-vocabulary English speech recognition systems. It has been proved that CD-DNN-HMMs significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs (CD-GMM-HMMs). In this paper, we report our latest progress on CD-DNN-HMMs for commercial Mandarin speech recognition applications in Baidu. Experiments demonstrate that CD-DNN-HMMs can get relative 26% word error reduction and relative 16% sentence error reduction in Baidu's short message (SMS) voice input and voice search applications, respectively, compared with state-of-the-art CD-GMM-HMMs trained using fMPE. To the best of our knowledge, this is the first time the performances of CD-DNN-HMMs are reported for commercial Mandarin speech recognition applications. We also propose a GPU on-chip speed-up training approach which can achieve a speed-up ratio of nearly two for DNN training.

UR - http://www.scopus.com/inward/record.url?scp=84893329649&partnerID=8YFLogxK

U2 - 10.1109/APSIPA.2013.6694268

DO - 10.1109/APSIPA.2013.6694268

M3 - 会议稿件

AN - SCOPUS:84893329649

SN - 9789869000604

T3 - 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013

BT - 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013

T2 - 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013

Y2 - 29 October 2013 through 1 November 2013

ER -

Niu J, Xie L, Jia L, Hu N. Context-dependent deep neural networks for commercial Mandarin speech recognition applications. In 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013. 2013. 6694268. (2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013). doi: 10.1109/APSIPA.2013.6694268

Context-dependent deep neural networks for commercial Mandarin speech recognition applications

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this