TY - JOUR
T1 - Discriminative two-level feature selection for realistic human action recognition
AU - Wu, Qiuxia
AU - Wang, Zhiyong
AU - Deng, Feiqi
AU - Xia, Yong
AU - Kang, Wenxiong
AU - Feng, David Dagan
PY - 2013
Y1 - 2013
N2 - Constructing the bag-of-features model from Space-time interest points (STIPs) has been successfully utilized for human action recognition. However, how to eliminate a large number of irrelevant STIPs for representing a specific action in realistic scenarios as well as how to select discriminative codewords for effective bag-of-features model still need to be further investigated. In this paper, we propose to select more representative codewords based on our pruned interest points algorithm so as to reduce computational cost as well as improve recognition performance. By taking human perception into account, attention based saliency map is employed to choose salient interest points which fall into salient regions, since visual saliency can provide strong evidence for the location of acting subjects. After salient interest points are identified, each human action is represented with the bag-of-features model. In order to obtain more discriminative codewords, an unsupervised codeword selection algorithm is utilized. Finally, the Support Vector Machine (SVM) method is employed to perform human action recognition. Comprehensive experimental results on the widely used and challenging Hollywood-2 Human Action (HOHA-2) dataset and YouTube dataset demonstrate that our proposed method is computationally efficient while achieving improved performance in recognizing realistic human actions.
AB - Constructing the bag-of-features model from Space-time interest points (STIPs) has been successfully utilized for human action recognition. However, how to eliminate a large number of irrelevant STIPs for representing a specific action in realistic scenarios as well as how to select discriminative codewords for effective bag-of-features model still need to be further investigated. In this paper, we propose to select more representative codewords based on our pruned interest points algorithm so as to reduce computational cost as well as improve recognition performance. By taking human perception into account, attention based saliency map is employed to choose salient interest points which fall into salient regions, since visual saliency can provide strong evidence for the location of acting subjects. After salient interest points are identified, each human action is represented with the bag-of-features model. In order to obtain more discriminative codewords, an unsupervised codeword selection algorithm is utilized. Finally, the Support Vector Machine (SVM) method is employed to perform human action recognition. Comprehensive experimental results on the widely used and challenging Hollywood-2 Human Action (HOHA-2) dataset and YouTube dataset demonstrate that our proposed method is computationally efficient while achieving improved performance in recognizing realistic human actions.
KW - Bag-of-features model
KW - Maximal information compression index
KW - Realistic human action recognition
KW - Saliency map
KW - Space-time interest points
KW - Support vector machine
KW - Unsupervised codeword selection
KW - Visual saliency
UR - http://www.scopus.com/inward/record.url?scp=84881193309&partnerID=8YFLogxK
U2 - 10.1016/j.jvcir.2013.07.001
DO - 10.1016/j.jvcir.2013.07.001
M3 - 文章
AN - SCOPUS:84881193309
SN - 1047-3203
VL - 24
SP - 1064
EP - 1074
JO - Journal of Visual Communication and Image Representation
JF - Journal of Visual Communication and Image Representation
IS - 7
ER -