A novel SM-DBN model for large-vocabulary continuous speech recognition and phone segmentation

Guoyun Lu; Dongmei Jiang; Yanning Zhang; Rongchun Zhao; Hichem Sahli

A novel SM-DBN model for large-vocabulary continuous speech recognition and phone segmentation

Guoyun Lu, Dongmei Jiang, Yanning Zhang, Rongchun Zhao, Hichem Sahli

School of Computer Science

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

A novel SM-DBN (Single-stream Multi-state Dynamic Bayesian Network) model is proposed. It is an augmentation of the Single Stream DBN Phone-shared (SS-DBN-P) model proposed by Bilmes et al^[4] whose basic recognition units are words, to which we add an extra level of hidden nodes-states, resulting in the SM-DBN model. In our model, a word is composed of its corresponding phones, a phone is composed of a fixed number of states, and a state is associated with the observation features. Essentially, it is a phone model whose basic recognition units are phones. We perform the recognition and segmentation experiments with both continuous digital speech database and large-vocabulary speech database, with the experimental results given in Tables 1 through 3 in the full paper. The experimental results on large-vocabulary and clean speech environment show preliminarily that the speech recognition rate of SM-DBN model is 13.01% and 35% higher than those of the HMM (Hidden Markov Model) and the SS-DBN-P model respectively, and that its phone segmentation accuracy is respectively 10% and 44% higher than the other two models.

Original language	English
Pages (from-to)	173-178
Number of pages	6
Journal	Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University
Volume	26
Issue number	2
State	Published - Apr 2008

Keywords

Continuous speech recognition
Phone segmentation
Single-stream multi-state dynamic Bayesian network (SM-DBN)

Cite this

@article{dde0de09112249049ba6aa3c0fbed73f,

title = "A novel SM-DBN model for large-vocabulary continuous speech recognition and phone segmentation",

abstract = "A novel SM-DBN (Single-stream Multi-state Dynamic Bayesian Network) model is proposed. It is an augmentation of the Single Stream DBN Phone-shared (SS-DBN-P) model proposed by Bilmes et al[4] whose basic recognition units are words, to which we add an extra level of hidden nodes-states, resulting in the SM-DBN model. In our model, a word is composed of its corresponding phones, a phone is composed of a fixed number of states, and a state is associated with the observation features. Essentially, it is a phone model whose basic recognition units are phones. We perform the recognition and segmentation experiments with both continuous digital speech database and large-vocabulary speech database, with the experimental results given in Tables 1 through 3 in the full paper. The experimental results on large-vocabulary and clean speech environment show preliminarily that the speech recognition rate of SM-DBN model is 13.01% and 35% higher than those of the HMM (Hidden Markov Model) and the SS-DBN-P model respectively, and that its phone segmentation accuracy is respectively 10% and 44% higher than the other two models.",

keywords = "Continuous speech recognition, Phone segmentation, Single-stream multi-state dynamic Bayesian network (SM-DBN)",

author = "Guoyun Lu and Dongmei Jiang and Yanning Zhang and Rongchun Zhao and Hichem Sahli",

year = "2008",

month = apr,

language = "英语",

volume = "26",

pages = "173--178",

journal = "Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University",

issn = "1000-2758",

publisher = "Northwestern Polytechnical University",

number = "2",

}

TY - JOUR

T1 - A novel SM-DBN model for large-vocabulary continuous speech recognition and phone segmentation

AU - Lu, Guoyun

AU - Jiang, Dongmei

AU - Zhang, Yanning

AU - Zhao, Rongchun

AU - Sahli, Hichem

PY - 2008/4

Y1 - 2008/4

N2 - A novel SM-DBN (Single-stream Multi-state Dynamic Bayesian Network) model is proposed. It is an augmentation of the Single Stream DBN Phone-shared (SS-DBN-P) model proposed by Bilmes et al[4] whose basic recognition units are words, to which we add an extra level of hidden nodes-states, resulting in the SM-DBN model. In our model, a word is composed of its corresponding phones, a phone is composed of a fixed number of states, and a state is associated with the observation features. Essentially, it is a phone model whose basic recognition units are phones. We perform the recognition and segmentation experiments with both continuous digital speech database and large-vocabulary speech database, with the experimental results given in Tables 1 through 3 in the full paper. The experimental results on large-vocabulary and clean speech environment show preliminarily that the speech recognition rate of SM-DBN model is 13.01% and 35% higher than those of the HMM (Hidden Markov Model) and the SS-DBN-P model respectively, and that its phone segmentation accuracy is respectively 10% and 44% higher than the other two models.

AB - A novel SM-DBN (Single-stream Multi-state Dynamic Bayesian Network) model is proposed. It is an augmentation of the Single Stream DBN Phone-shared (SS-DBN-P) model proposed by Bilmes et al[4] whose basic recognition units are words, to which we add an extra level of hidden nodes-states, resulting in the SM-DBN model. In our model, a word is composed of its corresponding phones, a phone is composed of a fixed number of states, and a state is associated with the observation features. Essentially, it is a phone model whose basic recognition units are phones. We perform the recognition and segmentation experiments with both continuous digital speech database and large-vocabulary speech database, with the experimental results given in Tables 1 through 3 in the full paper. The experimental results on large-vocabulary and clean speech environment show preliminarily that the speech recognition rate of SM-DBN model is 13.01% and 35% higher than those of the HMM (Hidden Markov Model) and the SS-DBN-P model respectively, and that its phone segmentation accuracy is respectively 10% and 44% higher than the other two models.

KW - Continuous speech recognition

KW - Phone segmentation

KW - Single-stream multi-state dynamic Bayesian network (SM-DBN)

UR - http://www.scopus.com/inward/record.url?scp=44849113105&partnerID=8YFLogxK

M3 - 文章

AN - SCOPUS:44849113105

SN - 1000-2758

VL - 26

SP - 173

EP - 178

JO - Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University

JF - Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University

IS - 2

ER -

A novel SM-DBN model for large-vocabulary continuous speech recognition and phone segmentation

Abstract

Keywords

Other files and links

Fingerprint

Cite this