Pyramid Dilated Attention Network for Action Segmentation

Zexing Du, Feng Mei, Xiaohan Lai, Qing Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Action segmentation has been widely studied with the development of temporal convolution networks. However, the correlations of frames with different time intervals are still not well explored. Especially, in untrimmed videos, frames always play different roles. Consecutive frames can provide local spatiotemporal information, and distant frames provide global information. Therefore, applying attention in the same dimension cannot exploit such differences. In addition, untrimmed videos generally contain thousands of frames, and directly applying attention to the whole video would be computationally heavy and inefficient. In this paper, we propose a dilated attention module (DAM), which builds attention maps in a dilated manner, rather than on the whole sequence. To explore correlations between frames with different intervals, we propose a pyramid dilated attention network (PDAN), which uses higher dimension features to exploit the relationships with a short interval to get the local information and uses lower dimension features to study the correlations with a long interval to explore the global information. When MS-TCN is equipped with the PDAN, the state-of-the-art performance is achieved on three challenging datasets.

Original languageEnglish
Title of host publication13th International Conference on Wireless Communications and Signal Processing, WCSP 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665407854
DOIs
StatePublished - 2021
Event13th International Conference on Wireless Communications and Signal Processing, WCSP 2021 - Virtual, Online, China
Duration: 20 Oct 202122 Oct 2021

Publication series

Name13th International Conference on Wireless Communications and Signal Processing, WCSP 2021

Conference

Conference13th International Conference on Wireless Communications and Signal Processing, WCSP 2021
Country/TerritoryChina
CityVirtual, Online
Period20/10/2122/10/21

Keywords

  • action segmentation
  • frame correlation
  • temporal convolution network
  • untrimmed video

Fingerprint

Dive into the research topics of 'Pyramid Dilated Attention Network for Action Segmentation'. Together they form a unique fingerprint.

Cite this