Few-shot action recognition with implicit temporal alignment and pair similarity optimization

Congqi Cao; Yajuan Li; Qinyi Lv; Peng Wang; Yanning Zhang

doi:10.1016/j.cviu.2021.103250

Few-shot action recognition with implicit temporal alignment and pair similarity optimization

Congqi Cao, Yajuan Li, Qinyi Lv, Peng Wang, Yanning Zhang

计算机学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

22 引用（Scopus）

摘要

Few-shot learning aims to recognize instances from novel classes with few labeled samples, which has great value in research and application. Although there has been a lot of work in this area recently, most of the existing work is based on image classification tasks. Video-based few-shot action recognition has not been explored well and remains challenging: (1) the differences of implementation details among different papers make a fair comparison difficult; (2) the wide variations and misalignment of temporal sequences make the video-level similarity comparison difficult; (3) the scarcity of labeled data makes the optimization difficult. To solve these problems, this paper presents (1) a specific setting to evaluate the performance of few-shot action recognition algorithms; (2) an implicit sequence-alignment algorithm for better video-level similarity comparison; (3) an advanced loss for few-shot learning to optimize pair similarity with limited data. Specifically, we propose a novel few-shot action recognition framework that uses long short-term memory following 3D convolutional layers for sequence modeling and alignment. Circle loss is introduced to maximize the within-class similarity and minimize the between-class similarity flexibly towards a more definite convergence target. Instead of using random or ambiguous experimental settings, we set a concrete criterion analogous to the standard image-based few-shot learning setting for few-shot action recognition evaluation. Extensive experiments on two datasets demonstrate the effectiveness of our proposed method.

源语言	英语
文章编号	103250
期刊	Computer Vision and Image Understanding
卷	210
DOI	https://doi.org/10.1016/j.cviu.2021.103250
出版状态	已出版 - 9月 2021

访问文件

10.1016/j.cviu.2021.103250

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{1fde79973f2f4b4f8bec818e537db0fa,

title = "Few-shot action recognition with implicit temporal alignment and pair similarity optimization",

abstract = "Few-shot learning aims to recognize instances from novel classes with few labeled samples, which has great value in research and application. Although there has been a lot of work in this area recently, most of the existing work is based on image classification tasks. Video-based few-shot action recognition has not been explored well and remains challenging: (1) the differences of implementation details among different papers make a fair comparison difficult; (2) the wide variations and misalignment of temporal sequences make the video-level similarity comparison difficult; (3) the scarcity of labeled data makes the optimization difficult. To solve these problems, this paper presents (1) a specific setting to evaluate the performance of few-shot action recognition algorithms; (2) an implicit sequence-alignment algorithm for better video-level similarity comparison; (3) an advanced loss for few-shot learning to optimize pair similarity with limited data. Specifically, we propose a novel few-shot action recognition framework that uses long short-term memory following 3D convolutional layers for sequence modeling and alignment. Circle loss is introduced to maximize the within-class similarity and minimize the between-class similarity flexibly towards a more definite convergence target. Instead of using random or ambiguous experimental settings, we set a concrete criterion analogous to the standard image-based few-shot learning setting for few-shot action recognition evaluation. Extensive experiments on two datasets demonstrate the effectiveness of our proposed method.",

keywords = "Few-shot action recognition, Implicit alignment, Similarity optimization, Temporal modeling",

author = "Congqi Cao and Yajuan Li and Qinyi Lv and Peng Wang and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 2021 Elsevier Inc.",

year = "2021",

month = sep,

doi = "10.1016/j.cviu.2021.103250",

language = "英语",

volume = "210",

journal = "Computer Vision and Image Understanding",

issn = "1077-3142",

publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Few-shot action recognition with implicit temporal alignment and pair similarity optimization

AU - Cao, Congqi

AU - Li, Yajuan

AU - Lv, Qinyi

AU - Wang, Peng

AU - Zhang, Yanning

PY - 2021/9

Y1 - 2021/9

N2 - Few-shot learning aims to recognize instances from novel classes with few labeled samples, which has great value in research and application. Although there has been a lot of work in this area recently, most of the existing work is based on image classification tasks. Video-based few-shot action recognition has not been explored well and remains challenging: (1) the differences of implementation details among different papers make a fair comparison difficult; (2) the wide variations and misalignment of temporal sequences make the video-level similarity comparison difficult; (3) the scarcity of labeled data makes the optimization difficult. To solve these problems, this paper presents (1) a specific setting to evaluate the performance of few-shot action recognition algorithms; (2) an implicit sequence-alignment algorithm for better video-level similarity comparison; (3) an advanced loss for few-shot learning to optimize pair similarity with limited data. Specifically, we propose a novel few-shot action recognition framework that uses long short-term memory following 3D convolutional layers for sequence modeling and alignment. Circle loss is introduced to maximize the within-class similarity and minimize the between-class similarity flexibly towards a more definite convergence target. Instead of using random or ambiguous experimental settings, we set a concrete criterion analogous to the standard image-based few-shot learning setting for few-shot action recognition evaluation. Extensive experiments on two datasets demonstrate the effectiveness of our proposed method.

AB - Few-shot learning aims to recognize instances from novel classes with few labeled samples, which has great value in research and application. Although there has been a lot of work in this area recently, most of the existing work is based on image classification tasks. Video-based few-shot action recognition has not been explored well and remains challenging: (1) the differences of implementation details among different papers make a fair comparison difficult; (2) the wide variations and misalignment of temporal sequences make the video-level similarity comparison difficult; (3) the scarcity of labeled data makes the optimization difficult. To solve these problems, this paper presents (1) a specific setting to evaluate the performance of few-shot action recognition algorithms; (2) an implicit sequence-alignment algorithm for better video-level similarity comparison; (3) an advanced loss for few-shot learning to optimize pair similarity with limited data. Specifically, we propose a novel few-shot action recognition framework that uses long short-term memory following 3D convolutional layers for sequence modeling and alignment. Circle loss is introduced to maximize the within-class similarity and minimize the between-class similarity flexibly towards a more definite convergence target. Instead of using random or ambiguous experimental settings, we set a concrete criterion analogous to the standard image-based few-shot learning setting for few-shot action recognition evaluation. Extensive experiments on two datasets demonstrate the effectiveness of our proposed method.

KW - Few-shot action recognition

KW - Implicit alignment

KW - Similarity optimization

KW - Temporal modeling

UR - http://www.scopus.com/inward/record.url?scp=85111217698&partnerID=8YFLogxK

U2 - 10.1016/j.cviu.2021.103250

DO - 10.1016/j.cviu.2021.103250

M3 - 文章

AN - SCOPUS:85111217698

SN - 1077-3142

VL - 210

JO - Computer Vision and Image Understanding

JF - Computer Vision and Image Understanding

M1 - 103250

ER -

Few-shot action recognition with implicit temporal alignment and pair similarity optimization

摘要

访问文件

其它文件与链接

指纹

引用此