Feature pre-inpainting enhanced transformer for video inpainting

Guanxiao Li, Ke Zhang, Yu Su, Jingyu Wang

科研成果: 期刊稿件文章同行评审

7 引用 (Scopus)

摘要

Transformer-based video inpainting methods aggregate coherent contents into missing regions by learning dependencies spatial–temporally. However, existing methods suffer from the inaccurate self-attention calculation and excessive quadratic computational complexity, due to uninformative representations of missing regions and inefficient global self-attention mechanisms, respectively. To mitigate these problems, we propose a Feature pre-Inpainting enhanced Transformer (FITer) video inpainting method, in which the feature pre-inpainting network (FPNet) and local–global interleaving Transformer are designed. The FPNet pre-inpaints missing features before the Transformer by exploiting spatial context, and the representations of missing regions are thus enhanced with more informative content. Therefore, the interleaving Transformer can calculate more accurate self-attention weights and learns more effective dependencies between missing and valid regions. Since the interleaving Transformer involves both global and window-based local self-attention mechanisms, the proposed FITer method can effectively aggregate spatial–temporal features into missing regions while improving efficiency. Experiments on YouTube-VOS and DAVIS datasets demonstrate that the FITer method outperforms previous methods qualitatively and quantitatively.

源语言英语
文章编号106323
期刊Engineering Applications of Artificial Intelligence
123
DOI
出版状态已出版 - 8月 2023

指纹

探究 'Feature pre-inpainting enhanced transformer for video inpainting' 的科研主题。它们共同构成独一无二的指纹。

引用此