Continuous speech recognition for large vocabulary based on triphone DBN model

Guoyun Lv; Rongchun Zhao; Yanning Zhang; Yangyu Fan; Hichem Sahli

Continuous speech recognition for large vocabulary based on triphone DBN model

Guoyun Lv, Rongchun Zhao, Yanning Zhang, Yangyu Fan, Hichem Sahli

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

To avoid coarticulatory effects in continuous speech recognition, based on word-phone structure dynamic Bayesian network (WP-DBN) model and word-phone-state structure DBN (WPS-DBN) model, context-dependent triphone units are introduced. Two novel single stream DBN models, that is, word-triphone structure DBN (WT-DBN) and word-triphone-state structure DBN (WTS-DBN) models, are proposed for continuous speech recognition. WTS-DBN model is a triphone model and its modeling unit is triphone. It simulates a conventional HMM (hidden markov model) based triphone state-tying. Experimental results in large-vocabulary and clean speech environment show that the speech recognition rates of WTS-DBN model increase 20. 53%, 40.77%, 42.72% and 7.52% than those of the HMM, WT-DBN, WP-DBN and WPS-DBN models.

Original language	English
Pages (from-to)	1-6
Number of pages	6
Journal	Shuju Caiji Yu Chuli/Journal of Data Acquisition and Processing
Volume	24
Issue number	1
State	Published - Jan 2009

Keywords

Dynamic Bayesian network
Phone
Speech recognition
Triphone

Cite this

@article{6896eb1ae37c4f26b5cabea7c70d4a46,

title = "Continuous speech recognition for large vocabulary based on triphone DBN model",

abstract = "To avoid coarticulatory effects in continuous speech recognition, based on word-phone structure dynamic Bayesian network (WP-DBN) model and word-phone-state structure DBN (WPS-DBN) model, context-dependent triphone units are introduced. Two novel single stream DBN models, that is, word-triphone structure DBN (WT-DBN) and word-triphone-state structure DBN (WTS-DBN) models, are proposed for continuous speech recognition. WTS-DBN model is a triphone model and its modeling unit is triphone. It simulates a conventional HMM (hidden markov model) based triphone state-tying. Experimental results in large-vocabulary and clean speech environment show that the speech recognition rates of WTS-DBN model increase 20. 53%, 40.77%, 42.72% and 7.52% than those of the HMM, WT-DBN, WP-DBN and WPS-DBN models.",

keywords = "Dynamic Bayesian network, Phone, Speech recognition, Triphone",

author = "Guoyun Lv and Rongchun Zhao and Yanning Zhang and Yangyu Fan and Hichem Sahli",

year = "2009",

month = jan,

language = "英语",

volume = "24",

pages = "1--6",

journal = "Shuju Caiji Yu Chuli/Journal of Data Acquisition and Processing",

issn = "1004-9037",

publisher = "Nanjing University of Aeronautics an Astronautics",

number = "1",

}

TY - JOUR

T1 - Continuous speech recognition for large vocabulary based on triphone DBN model

AU - Lv, Guoyun

AU - Zhao, Rongchun

AU - Zhang, Yanning

AU - Fan, Yangyu

AU - Sahli, Hichem

PY - 2009/1

Y1 - 2009/1

N2 - To avoid coarticulatory effects in continuous speech recognition, based on word-phone structure dynamic Bayesian network (WP-DBN) model and word-phone-state structure DBN (WPS-DBN) model, context-dependent triphone units are introduced. Two novel single stream DBN models, that is, word-triphone structure DBN (WT-DBN) and word-triphone-state structure DBN (WTS-DBN) models, are proposed for continuous speech recognition. WTS-DBN model is a triphone model and its modeling unit is triphone. It simulates a conventional HMM (hidden markov model) based triphone state-tying. Experimental results in large-vocabulary and clean speech environment show that the speech recognition rates of WTS-DBN model increase 20. 53%, 40.77%, 42.72% and 7.52% than those of the HMM, WT-DBN, WP-DBN and WPS-DBN models.

AB - To avoid coarticulatory effects in continuous speech recognition, based on word-phone structure dynamic Bayesian network (WP-DBN) model and word-phone-state structure DBN (WPS-DBN) model, context-dependent triphone units are introduced. Two novel single stream DBN models, that is, word-triphone structure DBN (WT-DBN) and word-triphone-state structure DBN (WTS-DBN) models, are proposed for continuous speech recognition. WTS-DBN model is a triphone model and its modeling unit is triphone. It simulates a conventional HMM (hidden markov model) based triphone state-tying. Experimental results in large-vocabulary and clean speech environment show that the speech recognition rates of WTS-DBN model increase 20. 53%, 40.77%, 42.72% and 7.52% than those of the HMM, WT-DBN, WP-DBN and WPS-DBN models.

KW - Dynamic Bayesian network

KW - Phone

KW - Speech recognition

KW - Triphone

UR - http://www.scopus.com/inward/record.url?scp=62249156263&partnerID=8YFLogxK

M3 - 文章

AN - SCOPUS:62249156263

SN - 1004-9037

VL - 24

SP - 1

EP - 6

JO - Shuju Caiji Yu Chuli/Journal of Data Acquisition and Processing

JF - Shuju Caiji Yu Chuli/Journal of Data Acquisition and Processing

IS - 1

ER -

Continuous speech recognition for large vocabulary based on triphone DBN model

Abstract

Keywords

Other files and links

Fingerprint

Cite this