MFI: Multi-range feature interchange for video action recognition

Sikai Bai; Qi Wang; Xuelong Li

doi:10.1109/ICPR48806.2021.9412124

MFI: Multi-range feature interchange for video action recognition

Sikai Bai, Qi Wang, Xuelong Li

School of Artificial Intelligence, OPtics and Electronics

Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

5 Scopus citations

Abstract

Short-range motion features and long-range dependencies are two complementary and vital cues for action recognition in videos, but it remains unclear how to efficiently and effectively extract these two features. In this paper, we propose a novel network to capture these two features in a unified 2D framework. Specifically, we first construct a Short-range Temporal Interchange (STI) block, which contains a Channels-wise Temporal Interchange (CTI) module for encoding short-range motion features. Then a Graph-based Regional Interchange (GRI) module is built to present long-range dependencies using graph convolution. Finally, we replace original bottleneck blocks in the ResNet with STI blocks and insert several GRI modules between STI blocks, to form a Multi-range Feature Interchange (MFI) Network. Practically, extensive experiments are conducted on three action recognition datasets (i.e., Something-Something V1, HMDB51, and UCF101), which demonstrate that the proposed MFI network achieves impressive results with very limited computing cost.

Original language	English
Title of host publication	Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	6664-6671
Number of pages	8
ISBN (Electronic)	9781728188089
DOIs	https://doi.org/10.1109/ICPR48806.2021.9412124
State	Published - 2020
Event	25th International Conference on Pattern Recognition, ICPR 2020 - Virtual, Milan, Italy Duration: 10 Jan 2021 → 15 Jan 2021

Publication series

Name	Proceedings - International Conference on Pattern Recognition
ISSN (Print)	1051-4651

Conference

Conference	25th International Conference on Pattern Recognition, ICPR 2020
Country/Territory	Italy
City	Virtual, Milan
Period	10/01/21 → 15/01/21

Access to Document

10.1109/ICPR48806.2021.9412124

Cite this

Bai, S., Wang, Q., & Li, X. (2020). MFI: Multi-range feature interchange for video action recognition. In Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition (pp. 6664-6671). Article 9412124 (Proceedings - International Conference on Pattern Recognition). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPR48806.2021.9412124

@inproceedings{b4aaf3f829814f939538f6590d576328,

title = "MFI: Multi-range feature interchange for video action recognition",

abstract = "Short-range motion features and long-range dependencies are two complementary and vital cues for action recognition in videos, but it remains unclear how to efficiently and effectively extract these two features. In this paper, we propose a novel network to capture these two features in a unified 2D framework. Specifically, we first construct a Short-range Temporal Interchange (STI) block, which contains a Channels-wise Temporal Interchange (CTI) module for encoding short-range motion features. Then a Graph-based Regional Interchange (GRI) module is built to present long-range dependencies using graph convolution. Finally, we replace original bottleneck blocks in the ResNet with STI blocks and insert several GRI modules between STI blocks, to form a Multi-range Feature Interchange (MFI) Network. Practically, extensive experiments are conducted on three action recognition datasets (i.e., Something-Something V1, HMDB51, and UCF101), which demonstrate that the proposed MFI network achieves impressive results with very limited computing cost.",

author = "Sikai Bai and Qi Wang and Xuelong Li",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE; 25th International Conference on Pattern Recognition, ICPR 2020 ; Conference date: 10-01-2021 Through 15-01-2021",

year = "2020",

doi = "10.1109/ICPR48806.2021.9412124",

language = "英语",

series = "Proceedings - International Conference on Pattern Recognition",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "6664--6671",

booktitle = "Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition",

}

Bai, S, Wang, Q & Li, X 2020, MFI: Multi-range feature interchange for video action recognition. in Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition., 9412124, Proceedings - International Conference on Pattern Recognition, Institute of Electrical and Electronics Engineers Inc., pp. 6664-6671, 25th International Conference on Pattern Recognition, ICPR 2020, Virtual, Milan, Italy, 10/01/21. https://doi.org/10.1109/ICPR48806.2021.9412124

MFI: Multi-range feature interchange for video action recognition. / Bai, Sikai; Wang, Qi; Li, Xuelong.
Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2020. p. 6664-6671 9412124 (Proceedings - International Conference on Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - MFI

T2 - 25th International Conference on Pattern Recognition, ICPR 2020

AU - Bai, Sikai

AU - Wang, Qi

AU - Li, Xuelong

PY - 2020

Y1 - 2020

N2 - Short-range motion features and long-range dependencies are two complementary and vital cues for action recognition in videos, but it remains unclear how to efficiently and effectively extract these two features. In this paper, we propose a novel network to capture these two features in a unified 2D framework. Specifically, we first construct a Short-range Temporal Interchange (STI) block, which contains a Channels-wise Temporal Interchange (CTI) module for encoding short-range motion features. Then a Graph-based Regional Interchange (GRI) module is built to present long-range dependencies using graph convolution. Finally, we replace original bottleneck blocks in the ResNet with STI blocks and insert several GRI modules between STI blocks, to form a Multi-range Feature Interchange (MFI) Network. Practically, extensive experiments are conducted on three action recognition datasets (i.e., Something-Something V1, HMDB51, and UCF101), which demonstrate that the proposed MFI network achieves impressive results with very limited computing cost.

AB - Short-range motion features and long-range dependencies are two complementary and vital cues for action recognition in videos, but it remains unclear how to efficiently and effectively extract these two features. In this paper, we propose a novel network to capture these two features in a unified 2D framework. Specifically, we first construct a Short-range Temporal Interchange (STI) block, which contains a Channels-wise Temporal Interchange (CTI) module for encoding short-range motion features. Then a Graph-based Regional Interchange (GRI) module is built to present long-range dependencies using graph convolution. Finally, we replace original bottleneck blocks in the ResNet with STI blocks and insert several GRI modules between STI blocks, to form a Multi-range Feature Interchange (MFI) Network. Practically, extensive experiments are conducted on three action recognition datasets (i.e., Something-Something V1, HMDB51, and UCF101), which demonstrate that the proposed MFI network achieves impressive results with very limited computing cost.

UR - http://www.scopus.com/inward/record.url?scp=85110414661&partnerID=8YFLogxK

U2 - 10.1109/ICPR48806.2021.9412124

DO - 10.1109/ICPR48806.2021.9412124

M3 - 会议稿件

AN - SCOPUS:85110414661

T3 - Proceedings - International Conference on Pattern Recognition

SP - 6664

EP - 6671

BT - Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 10 January 2021 through 15 January 2021

ER -

MFI: Multi-range feature interchange for video action recognition

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this