A novel SM-DBN model for large-vocabulary continuous speech recognition and phone segmentation

Guoyun Lu; Dongmei Jiang; Yanning Zhang; Rongchun Zhao; Hichem Sahli

A novel SM-DBN model for large-vocabulary continuous speech recognition and phone segmentation

Guoyun Lu, Dongmei Jiang, Yanning Zhang, Rongchun Zhao, Hichem Sahli

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

A novel SM-DBN (Single-stream Multi-state Dynamic Bayesian Network) model is proposed. It is an augmentation of the Single Stream DBN Phone-shared (SS-DBN-P) model proposed by Bilmes et al^[4] whose basic recognition units are words, to which we add an extra level of hidden nodes-states, resulting in the SM-DBN model. In our model, a word is composed of its corresponding phones, a phone is composed of a fixed number of states, and a state is associated with the observation features. Essentially, it is a phone model whose basic recognition units are phones. We perform the recognition and segmentation experiments with both continuous digital speech database and large-vocabulary speech database, with the experimental results given in Tables 1 through 3 in the full paper. The experimental results on large-vocabulary and clean speech environment show preliminarily that the speech recognition rate of SM-DBN model is 13.01% and 35% higher than those of the HMM (Hidden Markov Model) and the SS-DBN-P model respectively, and that its phone segmentation accuracy is respectively 10% and 44% higher than the other two models.

源语言	英语
页（从-至）	173-178
页数	6
期刊	Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University
卷	26
期	2
出版状态	已出版 - 4月 2008

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{dde0de09112249049ba6aa3c0fbed73f,

title = "A novel SM-DBN model for large-vocabulary continuous speech recognition and phone segmentation",

abstract = "A novel SM-DBN (Single-stream Multi-state Dynamic Bayesian Network) model is proposed. It is an augmentation of the Single Stream DBN Phone-shared (SS-DBN-P) model proposed by Bilmes et al[4] whose basic recognition units are words, to which we add an extra level of hidden nodes-states, resulting in the SM-DBN model. In our model, a word is composed of its corresponding phones, a phone is composed of a fixed number of states, and a state is associated with the observation features. Essentially, it is a phone model whose basic recognition units are phones. We perform the recognition and segmentation experiments with both continuous digital speech database and large-vocabulary speech database, with the experimental results given in Tables 1 through 3 in the full paper. The experimental results on large-vocabulary and clean speech environment show preliminarily that the speech recognition rate of SM-DBN model is 13.01% and 35% higher than those of the HMM (Hidden Markov Model) and the SS-DBN-P model respectively, and that its phone segmentation accuracy is respectively 10% and 44% higher than the other two models.",

keywords = "Continuous speech recognition, Phone segmentation, Single-stream multi-state dynamic Bayesian network (SM-DBN)",

author = "Guoyun Lu and Dongmei Jiang and Yanning Zhang and Rongchun Zhao and Hichem Sahli",

year = "2008",

month = apr,

language = "英语",

volume = "26",

pages = "173--178",

journal = "Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University",

issn = "1000-2758",

publisher = "Northwestern Polytechnical University",

number = "2",

}

TY - JOUR

T1 - A novel SM-DBN model for large-vocabulary continuous speech recognition and phone segmentation

AU - Lu, Guoyun

AU - Jiang, Dongmei

AU - Zhang, Yanning

AU - Zhao, Rongchun

AU - Sahli, Hichem

PY - 2008/4

Y1 - 2008/4

N2 - A novel SM-DBN (Single-stream Multi-state Dynamic Bayesian Network) model is proposed. It is an augmentation of the Single Stream DBN Phone-shared (SS-DBN-P) model proposed by Bilmes et al[4] whose basic recognition units are words, to which we add an extra level of hidden nodes-states, resulting in the SM-DBN model. In our model, a word is composed of its corresponding phones, a phone is composed of a fixed number of states, and a state is associated with the observation features. Essentially, it is a phone model whose basic recognition units are phones. We perform the recognition and segmentation experiments with both continuous digital speech database and large-vocabulary speech database, with the experimental results given in Tables 1 through 3 in the full paper. The experimental results on large-vocabulary and clean speech environment show preliminarily that the speech recognition rate of SM-DBN model is 13.01% and 35% higher than those of the HMM (Hidden Markov Model) and the SS-DBN-P model respectively, and that its phone segmentation accuracy is respectively 10% and 44% higher than the other two models.

AB - A novel SM-DBN (Single-stream Multi-state Dynamic Bayesian Network) model is proposed. It is an augmentation of the Single Stream DBN Phone-shared (SS-DBN-P) model proposed by Bilmes et al[4] whose basic recognition units are words, to which we add an extra level of hidden nodes-states, resulting in the SM-DBN model. In our model, a word is composed of its corresponding phones, a phone is composed of a fixed number of states, and a state is associated with the observation features. Essentially, it is a phone model whose basic recognition units are phones. We perform the recognition and segmentation experiments with both continuous digital speech database and large-vocabulary speech database, with the experimental results given in Tables 1 through 3 in the full paper. The experimental results on large-vocabulary and clean speech environment show preliminarily that the speech recognition rate of SM-DBN model is 13.01% and 35% higher than those of the HMM (Hidden Markov Model) and the SS-DBN-P model respectively, and that its phone segmentation accuracy is respectively 10% and 44% higher than the other two models.

KW - Continuous speech recognition

KW - Phone segmentation

KW - Single-stream multi-state dynamic Bayesian network (SM-DBN)

UR - http://www.scopus.com/inward/record.url?scp=44849113105&partnerID=8YFLogxK

M3 - 文章

AN - SCOPUS:44849113105

SN - 1000-2758

VL - 26

SP - 173

EP - 178

JO - Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University

JF - Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University

IS - 2

ER -

A novel SM-DBN model for large-vocabulary continuous speech recognition and phone segmentation

摘要

其它文件与链接

指纹

引用此