Audio-visual human recognition using semi-supervised spectral learning and hidden Markov models

Wei Feng; Lei Xie; Jia Zeng; Zhi Qiang Liu

doi:10.1016/j.jvlc.2009.01.009

Audio-visual human recognition using semi-supervised spectral learning and hidden Markov models

Wei Feng, Lei Xie, Jia Zeng, Zhi Qiang Liu

School of Computer Science

Research output: Contribution to journal › Article › peer-review

15 Scopus citations

Abstract

This paper presents a multimodal system for reliable human identity recognition under variant conditions. Our system fuses the recognition of face and speech with a general probabilistic framework. For face recognition, we propose a new spectral learning algorithm, which considers not only the discriminative relations among the training data but also the generative models for each class. Due to the tedious cost of face labeling in practice, our spectral face learning utilizes a semi-supervised strategy. That is, only a small number of labeled faces are used in our training step, and the labels are optimally propagated to other unlabeled training faces. Besides requiring much less labeled data, our algorithm also enables a natural way to explicitly train an outlier model that approximately represents unauthorized faces. To boost the robustness of our system for human recognition under various environments, our face recognition is further complemented by a speaker identification agent. Specifically, this agent models the statistical variations of fixed-phrase speech using speaker-dependent word hidden Markov models. Experiments on benchmark databases validate the effectiveness of our face recognition and speaker identification agents, and demonstrate that the recognition accuracy can be apparently improved by integrating these two independent biometric sources together.

Original language	English
Pages (from-to)	188-195
Number of pages	8
Journal	Journal of Visual Languages and Computing
Volume	20
Issue number	3
DOIs	https://doi.org/10.1016/j.jvlc.2009.01.009
State	Published - Jun 2009

Keywords

Face recognition
Hidden Markov models (HMMs)
Semi-supervised spectral learning
Speaker identification

Access to Document

10.1016/j.jvlc.2009.01.009

Cite this

@article{5f8c22ef091e4c9b97df8740fb3c41c3,

title = "Audio-visual human recognition using semi-supervised spectral learning and hidden Markov models",

abstract = "This paper presents a multimodal system for reliable human identity recognition under variant conditions. Our system fuses the recognition of face and speech with a general probabilistic framework. For face recognition, we propose a new spectral learning algorithm, which considers not only the discriminative relations among the training data but also the generative models for each class. Due to the tedious cost of face labeling in practice, our spectral face learning utilizes a semi-supervised strategy. That is, only a small number of labeled faces are used in our training step, and the labels are optimally propagated to other unlabeled training faces. Besides requiring much less labeled data, our algorithm also enables a natural way to explicitly train an outlier model that approximately represents unauthorized faces. To boost the robustness of our system for human recognition under various environments, our face recognition is further complemented by a speaker identification agent. Specifically, this agent models the statistical variations of fixed-phrase speech using speaker-dependent word hidden Markov models. Experiments on benchmark databases validate the effectiveness of our face recognition and speaker identification agents, and demonstrate that the recognition accuracy can be apparently improved by integrating these two independent biometric sources together.",

keywords = "Face recognition, Hidden Markov models (HMMs), Semi-supervised spectral learning, Speaker identification",

author = "Wei Feng and Lei Xie and Jia Zeng and Liu, {Zhi Qiang}",

year = "2009",

month = jun,

doi = "10.1016/j.jvlc.2009.01.009",

language = "英语",

volume = "20",

pages = "188--195",

journal = "Journal of Visual Languages and Computing",

issn = "1045-926X",

publisher = "Elsevier Ltd",

number = "3",

}

TY - JOUR

T1 - Audio-visual human recognition using semi-supervised spectral learning and hidden Markov models

AU - Feng, Wei

AU - Xie, Lei

AU - Zeng, Jia

AU - Liu, Zhi Qiang

PY - 2009/6

Y1 - 2009/6

N2 - This paper presents a multimodal system for reliable human identity recognition under variant conditions. Our system fuses the recognition of face and speech with a general probabilistic framework. For face recognition, we propose a new spectral learning algorithm, which considers not only the discriminative relations among the training data but also the generative models for each class. Due to the tedious cost of face labeling in practice, our spectral face learning utilizes a semi-supervised strategy. That is, only a small number of labeled faces are used in our training step, and the labels are optimally propagated to other unlabeled training faces. Besides requiring much less labeled data, our algorithm also enables a natural way to explicitly train an outlier model that approximately represents unauthorized faces. To boost the robustness of our system for human recognition under various environments, our face recognition is further complemented by a speaker identification agent. Specifically, this agent models the statistical variations of fixed-phrase speech using speaker-dependent word hidden Markov models. Experiments on benchmark databases validate the effectiveness of our face recognition and speaker identification agents, and demonstrate that the recognition accuracy can be apparently improved by integrating these two independent biometric sources together.

AB - This paper presents a multimodal system for reliable human identity recognition under variant conditions. Our system fuses the recognition of face and speech with a general probabilistic framework. For face recognition, we propose a new spectral learning algorithm, which considers not only the discriminative relations among the training data but also the generative models for each class. Due to the tedious cost of face labeling in practice, our spectral face learning utilizes a semi-supervised strategy. That is, only a small number of labeled faces are used in our training step, and the labels are optimally propagated to other unlabeled training faces. Besides requiring much less labeled data, our algorithm also enables a natural way to explicitly train an outlier model that approximately represents unauthorized faces. To boost the robustness of our system for human recognition under various environments, our face recognition is further complemented by a speaker identification agent. Specifically, this agent models the statistical variations of fixed-phrase speech using speaker-dependent word hidden Markov models. Experiments on benchmark databases validate the effectiveness of our face recognition and speaker identification agents, and demonstrate that the recognition accuracy can be apparently improved by integrating these two independent biometric sources together.

KW - Face recognition

KW - Hidden Markov models (HMMs)

KW - Semi-supervised spectral learning

KW - Speaker identification

UR - http://www.scopus.com/inward/record.url?scp=67349153381&partnerID=8YFLogxK

U2 - 10.1016/j.jvlc.2009.01.009

DO - 10.1016/j.jvlc.2009.01.009

M3 - 文章

AN - SCOPUS:67349153381

SN - 1045-926X

VL - 20

SP - 188

EP - 195

JO - Journal of Visual Languages and Computing

JF - Journal of Visual Languages and Computing

IS - 3

ER -

Audio-visual human recognition using semi-supervised spectral learning and hidden Markov models

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this