Lip contour extraction method based on multiple active shape model for audio visual speech recognition

Lei Xie; Wei Feng; Rongchun Zhao

Lip contour extraction method based on multiple active shape model for audio visual speech recognition

Lei Xie, Wei Feng, Rongchun Zhao

计算机学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

In audio visual speech recognition and lipreading, the widely used ASM (active shape model) for lip contour extraction suffers from the lack of robustness and cannot extract the exact lip contours due to the various mouth shape changes when uttering. We present a more robust model-multiple active shape model (MASM). The model classifies the mouth shapes into closed mouth set, half-opened mouth set, and round mouth set. An independent ASM is built for each different set with a tiny set of the training data. The MASM contour extraction algorithm automatically selects the best accurate lip contour from multiple shape searching procedures. Considering the consecutive changes of the mouth, a method for smoothing lip contours is also presented to correct the contour extraction errors. Experimental results from AVCONDIG database show that extraction accuracy achieved by the MASM is higher than that of conventional ASM 13%. The combination of the MASM and the contour-smoothing method leads to another 7% accuracy improvement. With the fusion of the exact lip contour feature and audio MFCC (mel frequency cepstral coefficients) feature, the average word recognition rates of the considered connected-digits speech recognition task are considerably increased under noisy acoustic conditions.

源语言	英语
页（从-至）	674-678
页数	5
期刊	Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University
卷	22
期	5
出版状态	已出版 - 10月 2004

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{598c501a1ea84888a090f6e431ed878f,

title = "Lip contour extraction method based on multiple active shape model for audio visual speech recognition",

abstract = "In audio visual speech recognition and lipreading, the widely used ASM (active shape model) for lip contour extraction suffers from the lack of robustness and cannot extract the exact lip contours due to the various mouth shape changes when uttering. We present a more robust model-multiple active shape model (MASM). The model classifies the mouth shapes into closed mouth set, half-opened mouth set, and round mouth set. An independent ASM is built for each different set with a tiny set of the training data. The MASM contour extraction algorithm automatically selects the best accurate lip contour from multiple shape searching procedures. Considering the consecutive changes of the mouth, a method for smoothing lip contours is also presented to correct the contour extraction errors. Experimental results from AVCONDIG database show that extraction accuracy achieved by the MASM is higher than that of conventional ASM 13%. The combination of the MASM and the contour-smoothing method leads to another 7% accuracy improvement. With the fusion of the exact lip contour feature and audio MFCC (mel frequency cepstral coefficients) feature, the average word recognition rates of the considered connected-digits speech recognition task are considerably increased under noisy acoustic conditions.",

keywords = "Active shape model, Audio visual speech recognition, Lip contour extraction, Multiple active shape model, Speech recognition",

author = "Lei Xie and Wei Feng and Rongchun Zhao",

year = "2004",

month = oct,

language = "英语",

volume = "22",

pages = "674--678",

journal = "Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University",

issn = "1000-2758",

publisher = "Northwestern Polytechnical University",

number = "5",

}

TY - JOUR

T1 - Lip contour extraction method based on multiple active shape model for audio visual speech recognition

AU - Xie, Lei

AU - Feng, Wei

AU - Zhao, Rongchun

PY - 2004/10

Y1 - 2004/10

N2 - In audio visual speech recognition and lipreading, the widely used ASM (active shape model) for lip contour extraction suffers from the lack of robustness and cannot extract the exact lip contours due to the various mouth shape changes when uttering. We present a more robust model-multiple active shape model (MASM). The model classifies the mouth shapes into closed mouth set, half-opened mouth set, and round mouth set. An independent ASM is built for each different set with a tiny set of the training data. The MASM contour extraction algorithm automatically selects the best accurate lip contour from multiple shape searching procedures. Considering the consecutive changes of the mouth, a method for smoothing lip contours is also presented to correct the contour extraction errors. Experimental results from AVCONDIG database show that extraction accuracy achieved by the MASM is higher than that of conventional ASM 13%. The combination of the MASM and the contour-smoothing method leads to another 7% accuracy improvement. With the fusion of the exact lip contour feature and audio MFCC (mel frequency cepstral coefficients) feature, the average word recognition rates of the considered connected-digits speech recognition task are considerably increased under noisy acoustic conditions.

AB - In audio visual speech recognition and lipreading, the widely used ASM (active shape model) for lip contour extraction suffers from the lack of robustness and cannot extract the exact lip contours due to the various mouth shape changes when uttering. We present a more robust model-multiple active shape model (MASM). The model classifies the mouth shapes into closed mouth set, half-opened mouth set, and round mouth set. An independent ASM is built for each different set with a tiny set of the training data. The MASM contour extraction algorithm automatically selects the best accurate lip contour from multiple shape searching procedures. Considering the consecutive changes of the mouth, a method for smoothing lip contours is also presented to correct the contour extraction errors. Experimental results from AVCONDIG database show that extraction accuracy achieved by the MASM is higher than that of conventional ASM 13%. The combination of the MASM and the contour-smoothing method leads to another 7% accuracy improvement. With the fusion of the exact lip contour feature and audio MFCC (mel frequency cepstral coefficients) feature, the average word recognition rates of the considered connected-digits speech recognition task are considerably increased under noisy acoustic conditions.

KW - Active shape model

KW - Audio visual speech recognition

KW - Lip contour extraction

KW - Multiple active shape model

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=11444268074&partnerID=8YFLogxK

M3 - 文章

AN - SCOPUS:11444268074

SN - 1000-2758

VL - 22

SP - 674

EP - 678

JO - Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University

JF - Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University

IS - 5

ER -

Lip contour extraction method based on multiple active shape model for audio visual speech recognition

摘要

其它文件与链接

指纹

引用此