Context-dependent deep neural networks for commercial Mandarin speech recognition applications

Jianwei Niu, Lei Xie, Lei Jia, Na Hu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Recently, context-dependent deep neural network hidden Markov models (CD-DNN-HMMs) have been successfully used in some commercial large-vocabulary English speech recognition systems. It has been proved that CD-DNN-HMMs significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs (CD-GMM-HMMs). In this paper, we report our latest progress on CD-DNN-HMMs for commercial Mandarin speech recognition applications in Baidu. Experiments demonstrate that CD-DNN-HMMs can get relative 26% word error reduction and relative 16% sentence error reduction in Baidu's short message (SMS) voice input and voice search applications, respectively, compared with state-of-the-art CD-GMM-HMMs trained using fMPE. To the best of our knowledge, this is the first time the performances of CD-DNN-HMMs are reported for commercial Mandarin speech recognition applications. We also propose a GPU on-chip speed-up training approach which can achieve a speed-up ratio of nearly two for DNN training.

Original languageEnglish
Title of host publication2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013
DOIs
StatePublished - 2013
Event2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013 - Kaohsiung, Taiwan, Province of China
Duration: 29 Oct 20131 Nov 2013

Publication series

Name2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013

Conference

Conference2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013
Country/TerritoryTaiwan, Province of China
CityKaohsiung
Period29/10/131/11/13

Fingerprint

Dive into the research topics of 'Context-dependent deep neural networks for commercial Mandarin speech recognition applications'. Together they form a unique fingerprint.

Cite this