TY - JOUR
T1 - A novel SM-DBN model for large-vocabulary continuous speech recognition and phone segmentation
AU - Lu, Guoyun
AU - Jiang, Dongmei
AU - Zhang, Yanning
AU - Zhao, Rongchun
AU - Sahli, Hichem
PY - 2008/4
Y1 - 2008/4
N2 - A novel SM-DBN (Single-stream Multi-state Dynamic Bayesian Network) model is proposed. It is an augmentation of the Single Stream DBN Phone-shared (SS-DBN-P) model proposed by Bilmes et al[4] whose basic recognition units are words, to which we add an extra level of hidden nodes-states, resulting in the SM-DBN model. In our model, a word is composed of its corresponding phones, a phone is composed of a fixed number of states, and a state is associated with the observation features. Essentially, it is a phone model whose basic recognition units are phones. We perform the recognition and segmentation experiments with both continuous digital speech database and large-vocabulary speech database, with the experimental results given in Tables 1 through 3 in the full paper. The experimental results on large-vocabulary and clean speech environment show preliminarily that the speech recognition rate of SM-DBN model is 13.01% and 35% higher than those of the HMM (Hidden Markov Model) and the SS-DBN-P model respectively, and that its phone segmentation accuracy is respectively 10% and 44% higher than the other two models.
AB - A novel SM-DBN (Single-stream Multi-state Dynamic Bayesian Network) model is proposed. It is an augmentation of the Single Stream DBN Phone-shared (SS-DBN-P) model proposed by Bilmes et al[4] whose basic recognition units are words, to which we add an extra level of hidden nodes-states, resulting in the SM-DBN model. In our model, a word is composed of its corresponding phones, a phone is composed of a fixed number of states, and a state is associated with the observation features. Essentially, it is a phone model whose basic recognition units are phones. We perform the recognition and segmentation experiments with both continuous digital speech database and large-vocabulary speech database, with the experimental results given in Tables 1 through 3 in the full paper. The experimental results on large-vocabulary and clean speech environment show preliminarily that the speech recognition rate of SM-DBN model is 13.01% and 35% higher than those of the HMM (Hidden Markov Model) and the SS-DBN-P model respectively, and that its phone segmentation accuracy is respectively 10% and 44% higher than the other two models.
KW - Continuous speech recognition
KW - Phone segmentation
KW - Single-stream multi-state dynamic Bayesian network (SM-DBN)
UR - http://www.scopus.com/inward/record.url?scp=44849113105&partnerID=8YFLogxK
M3 - 文章
AN - SCOPUS:44849113105
SN - 1000-2758
VL - 26
SP - 173
EP - 178
JO - Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University
JF - Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University
IS - 2
ER -