TY - GEN
T1 - Pyramid Dilated Attention Network for Action Segmentation
AU - Du, Zexing
AU - Mei, Feng
AU - Lai, Xiaohan
AU - Wang, Qing
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Action segmentation has been widely studied with the development of temporal convolution networks. However, the correlations of frames with different time intervals are still not well explored. Especially, in untrimmed videos, frames always play different roles. Consecutive frames can provide local spatiotemporal information, and distant frames provide global information. Therefore, applying attention in the same dimension cannot exploit such differences. In addition, untrimmed videos generally contain thousands of frames, and directly applying attention to the whole video would be computationally heavy and inefficient. In this paper, we propose a dilated attention module (DAM), which builds attention maps in a dilated manner, rather than on the whole sequence. To explore correlations between frames with different intervals, we propose a pyramid dilated attention network (PDAN), which uses higher dimension features to exploit the relationships with a short interval to get the local information and uses lower dimension features to study the correlations with a long interval to explore the global information. When MS-TCN is equipped with the PDAN, the state-of-the-art performance is achieved on three challenging datasets.
AB - Action segmentation has been widely studied with the development of temporal convolution networks. However, the correlations of frames with different time intervals are still not well explored. Especially, in untrimmed videos, frames always play different roles. Consecutive frames can provide local spatiotemporal information, and distant frames provide global information. Therefore, applying attention in the same dimension cannot exploit such differences. In addition, untrimmed videos generally contain thousands of frames, and directly applying attention to the whole video would be computationally heavy and inefficient. In this paper, we propose a dilated attention module (DAM), which builds attention maps in a dilated manner, rather than on the whole sequence. To explore correlations between frames with different intervals, we propose a pyramid dilated attention network (PDAN), which uses higher dimension features to exploit the relationships with a short interval to get the local information and uses lower dimension features to study the correlations with a long interval to explore the global information. When MS-TCN is equipped with the PDAN, the state-of-the-art performance is achieved on three challenging datasets.
KW - action segmentation
KW - frame correlation
KW - temporal convolution network
KW - untrimmed video
UR - http://www.scopus.com/inward/record.url?scp=85123345139&partnerID=8YFLogxK
U2 - 10.1109/WCSP52459.2021.9613432
DO - 10.1109/WCSP52459.2021.9613432
M3 - 会议稿件
AN - SCOPUS:85123345139
T3 - 13th International Conference on Wireless Communications and Signal Processing, WCSP 2021
BT - 13th International Conference on Wireless Communications and Signal Processing, WCSP 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 13th International Conference on Wireless Communications and Signal Processing, WCSP 2021
Y2 - 20 October 2021 through 22 October 2021
ER -