Top-k Self-Attention in Transformer for Video Inpainting

Guanxiao Li, Ke Zhang, Yu Su, Jing Yu Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Video inpainting restores missing content using global dependencies and relevant non-local frame portions. Recent Transformer-based techniques utilize self-attention mechanisms to establish connections between global patch embeddings. However, due to the scarcity of relevant regions, existing methods end up allocating partial attention weights to a significant number of irrelevant areas. This situation results in a dispersion of dependencies, which negatively impacts modeling accuracy. To address this issue, we introduce a top-k self-attention mechanism specifically designed for Transformer-based video inpainting, which filters out the weights of less relevant regions. This proposed mechanism computes a top-k weight threshold for each missing patch and compels the Transformer to focus on the k most pertinent patch embeddings. As a result, the accuracy of dependency modeling is enhanced, leading to more effective content aggregation for filling in the missing regions. The top-k mechanism is easily integrated into any Transformer-based model, and experiments conducted on the YouTube-VOS and DAVIS datasets show that it significantly improves the model's performance while maintaining high efficiency.

Original languageEnglish
Title of host publication2024 5th International Conference on Computer Engineering and Application, ICCEA 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1038-1042
Number of pages5
ISBN (Electronic)9798350386776
DOIs
StatePublished - 2024
Event5th International Conference on Computer Engineering and Application, ICCEA 2024 - Hybrid, Hangzhou, China
Duration: 12 Apr 202414 Apr 2024

Publication series

Name2024 5th International Conference on Computer Engineering and Application, ICCEA 2024

Conference

Conference5th International Conference on Computer Engineering and Application, ICCEA 2024
Country/TerritoryChina
CityHybrid, Hangzhou
Period12/04/2414/04/24

Keywords

  • top-k self-attention mechanism
  • Video inpainting
  • vision Transformer

Fingerprint

Dive into the research topics of 'Top-k Self-Attention in Transformer for Video Inpainting'. Together they form a unique fingerprint.

Cite this