TY - JOUR
T1 - Audio-visual human recognition using semi-supervised spectral learning and hidden Markov models
AU - Feng, Wei
AU - Xie, Lei
AU - Zeng, Jia
AU - Liu, Zhi Qiang
PY - 2009/6
Y1 - 2009/6
N2 - This paper presents a multimodal system for reliable human identity recognition under variant conditions. Our system fuses the recognition of face and speech with a general probabilistic framework. For face recognition, we propose a new spectral learning algorithm, which considers not only the discriminative relations among the training data but also the generative models for each class. Due to the tedious cost of face labeling in practice, our spectral face learning utilizes a semi-supervised strategy. That is, only a small number of labeled faces are used in our training step, and the labels are optimally propagated to other unlabeled training faces. Besides requiring much less labeled data, our algorithm also enables a natural way to explicitly train an outlier model that approximately represents unauthorized faces. To boost the robustness of our system for human recognition under various environments, our face recognition is further complemented by a speaker identification agent. Specifically, this agent models the statistical variations of fixed-phrase speech using speaker-dependent word hidden Markov models. Experiments on benchmark databases validate the effectiveness of our face recognition and speaker identification agents, and demonstrate that the recognition accuracy can be apparently improved by integrating these two independent biometric sources together.
AB - This paper presents a multimodal system for reliable human identity recognition under variant conditions. Our system fuses the recognition of face and speech with a general probabilistic framework. For face recognition, we propose a new spectral learning algorithm, which considers not only the discriminative relations among the training data but also the generative models for each class. Due to the tedious cost of face labeling in practice, our spectral face learning utilizes a semi-supervised strategy. That is, only a small number of labeled faces are used in our training step, and the labels are optimally propagated to other unlabeled training faces. Besides requiring much less labeled data, our algorithm also enables a natural way to explicitly train an outlier model that approximately represents unauthorized faces. To boost the robustness of our system for human recognition under various environments, our face recognition is further complemented by a speaker identification agent. Specifically, this agent models the statistical variations of fixed-phrase speech using speaker-dependent word hidden Markov models. Experiments on benchmark databases validate the effectiveness of our face recognition and speaker identification agents, and demonstrate that the recognition accuracy can be apparently improved by integrating these two independent biometric sources together.
KW - Face recognition
KW - Hidden Markov models (HMMs)
KW - Semi-supervised spectral learning
KW - Speaker identification
UR - http://www.scopus.com/inward/record.url?scp=67349153381&partnerID=8YFLogxK
U2 - 10.1016/j.jvlc.2009.01.009
DO - 10.1016/j.jvlc.2009.01.009
M3 - 文章
AN - SCOPUS:67349153381
SN - 1045-926X
VL - 20
SP - 188
EP - 195
JO - Journal of Visual Languages and Computing
JF - Journal of Visual Languages and Computing
IS - 3
ER -