Pyramid Dilated Attention Network for Action Segmentation

Zexing Du; Feng Mei; Xiaohan Lai; Qing Wang

doi:10.1109/WCSP52459.2021.9613432

Pyramid Dilated Attention Network for Action Segmentation

Zexing Du, Feng Mei, Xiaohan Lai, Qing Wang

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Action segmentation has been widely studied with the development of temporal convolution networks. However, the correlations of frames with different time intervals are still not well explored. Especially, in untrimmed videos, frames always play different roles. Consecutive frames can provide local spatiotemporal information, and distant frames provide global information. Therefore, applying attention in the same dimension cannot exploit such differences. In addition, untrimmed videos generally contain thousands of frames, and directly applying attention to the whole video would be computationally heavy and inefficient. In this paper, we propose a dilated attention module (DAM), which builds attention maps in a dilated manner, rather than on the whole sequence. To explore correlations between frames with different intervals, we propose a pyramid dilated attention network (PDAN), which uses higher dimension features to exploit the relationships with a short interval to get the local information and uses lower dimension features to study the correlations with a long interval to explore the global information. When MS-TCN is equipped with the PDAN, the state-of-the-art performance is achieved on three challenging datasets.

Original language	English
Title of host publication	13th International Conference on Wireless Communications and Signal Processing, WCSP 2021
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781665407854
DOIs	https://doi.org/10.1109/WCSP52459.2021.9613432
State	Published - 2021
Event	13th International Conference on Wireless Communications and Signal Processing, WCSP 2021 - Virtual, Online, China Duration: 20 Oct 2021 → 22 Oct 2021

Publication series

Name	13th International Conference on Wireless Communications and Signal Processing, WCSP 2021

Conference

Conference	13th International Conference on Wireless Communications and Signal Processing, WCSP 2021
Country/Territory	China
City	Virtual, Online
Period	20/10/21 → 22/10/21

Keywords

action segmentation
frame correlation
temporal convolution network
untrimmed video

Access to Document

10.1109/WCSP52459.2021.9613432

Cite this

Du, Z., Mei, F., Lai, X., & Wang, Q. (2021). Pyramid Dilated Attention Network for Action Segmentation. In 13th International Conference on Wireless Communications and Signal Processing, WCSP 2021 (13th International Conference on Wireless Communications and Signal Processing, WCSP 2021). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/WCSP52459.2021.9613432

@inproceedings{cde1fed950ff4d319269c62e0259f2d7,

title = "Pyramid Dilated Attention Network for Action Segmentation",

abstract = "Action segmentation has been widely studied with the development of temporal convolution networks. However, the correlations of frames with different time intervals are still not well explored. Especially, in untrimmed videos, frames always play different roles. Consecutive frames can provide local spatiotemporal information, and distant frames provide global information. Therefore, applying attention in the same dimension cannot exploit such differences. In addition, untrimmed videos generally contain thousands of frames, and directly applying attention to the whole video would be computationally heavy and inefficient. In this paper, we propose a dilated attention module (DAM), which builds attention maps in a dilated manner, rather than on the whole sequence. To explore correlations between frames with different intervals, we propose a pyramid dilated attention network (PDAN), which uses higher dimension features to exploit the relationships with a short interval to get the local information and uses lower dimension features to study the correlations with a long interval to explore the global information. When MS-TCN is equipped with the PDAN, the state-of-the-art performance is achieved on three challenging datasets.",

keywords = "action segmentation, frame correlation, temporal convolution network, untrimmed video",

author = "Zexing Du and Feng Mei and Xiaohan Lai and Qing Wang",

note = "Publisher Copyright: {\textcopyright} 2021 IEEE.; 13th International Conference on Wireless Communications and Signal Processing, WCSP 2021 ; Conference date: 20-10-2021 Through 22-10-2021",

year = "2021",

doi = "10.1109/WCSP52459.2021.9613432",

language = "英语",

series = "13th International Conference on Wireless Communications and Signal Processing, WCSP 2021",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "13th International Conference on Wireless Communications and Signal Processing, WCSP 2021",

}

Du, Z, Mei, F, Lai, X & Wang, Q 2021, Pyramid Dilated Attention Network for Action Segmentation. in 13th International Conference on Wireless Communications and Signal Processing, WCSP 2021. 13th International Conference on Wireless Communications and Signal Processing, WCSP 2021, Institute of Electrical and Electronics Engineers Inc., 13th International Conference on Wireless Communications and Signal Processing, WCSP 2021, Virtual, Online, China, 20/10/21. https://doi.org/10.1109/WCSP52459.2021.9613432

Pyramid Dilated Attention Network for Action Segmentation. / Du, Zexing; Mei, Feng; Lai, Xiaohan et al.
13th International Conference on Wireless Communications and Signal Processing, WCSP 2021. Institute of Electrical and Electronics Engineers Inc., 2021. (13th International Conference on Wireless Communications and Signal Processing, WCSP 2021).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Pyramid Dilated Attention Network for Action Segmentation

AU - Du, Zexing

AU - Mei, Feng

AU - Lai, Xiaohan

AU - Wang, Qing

PY - 2021

Y1 - 2021

N2 - Action segmentation has been widely studied with the development of temporal convolution networks. However, the correlations of frames with different time intervals are still not well explored. Especially, in untrimmed videos, frames always play different roles. Consecutive frames can provide local spatiotemporal information, and distant frames provide global information. Therefore, applying attention in the same dimension cannot exploit such differences. In addition, untrimmed videos generally contain thousands of frames, and directly applying attention to the whole video would be computationally heavy and inefficient. In this paper, we propose a dilated attention module (DAM), which builds attention maps in a dilated manner, rather than on the whole sequence. To explore correlations between frames with different intervals, we propose a pyramid dilated attention network (PDAN), which uses higher dimension features to exploit the relationships with a short interval to get the local information and uses lower dimension features to study the correlations with a long interval to explore the global information. When MS-TCN is equipped with the PDAN, the state-of-the-art performance is achieved on three challenging datasets.

AB - Action segmentation has been widely studied with the development of temporal convolution networks. However, the correlations of frames with different time intervals are still not well explored. Especially, in untrimmed videos, frames always play different roles. Consecutive frames can provide local spatiotemporal information, and distant frames provide global information. Therefore, applying attention in the same dimension cannot exploit such differences. In addition, untrimmed videos generally contain thousands of frames, and directly applying attention to the whole video would be computationally heavy and inefficient. In this paper, we propose a dilated attention module (DAM), which builds attention maps in a dilated manner, rather than on the whole sequence. To explore correlations between frames with different intervals, we propose a pyramid dilated attention network (PDAN), which uses higher dimension features to exploit the relationships with a short interval to get the local information and uses lower dimension features to study the correlations with a long interval to explore the global information. When MS-TCN is equipped with the PDAN, the state-of-the-art performance is achieved on three challenging datasets.

KW - action segmentation

KW - frame correlation

KW - temporal convolution network

KW - untrimmed video

UR - http://www.scopus.com/inward/record.url?scp=85123345139&partnerID=8YFLogxK

U2 - 10.1109/WCSP52459.2021.9613432

DO - 10.1109/WCSP52459.2021.9613432

M3 - 会议稿件

AN - SCOPUS:85123345139

T3 - 13th International Conference on Wireless Communications and Signal Processing, WCSP 2021

BT - 13th International Conference on Wireless Communications and Signal Processing, WCSP 2021

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 13th International Conference on Wireless Communications and Signal Processing, WCSP 2021

Y2 - 20 October 2021 through 22 October 2021

ER -

Du Z, Mei F, Lai X, Wang Q. Pyramid Dilated Attention Network for Action Segmentation. In 13th International Conference on Wireless Communications and Signal Processing, WCSP 2021. Institute of Electrical and Electronics Engineers Inc. 2021. (13th International Conference on Wireless Communications and Signal Processing, WCSP 2021). doi: 10.1109/WCSP52459.2021.9613432

Pyramid Dilated Attention Network for Action Segmentation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this