Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation

Nian Liu; Kepan Nan; Wangbo Zhao; Yuanwei Liu; Xiwen Yao; Salman Khan; Hisham Cholakkal; Rao Muhammad Anwer; Junwei Han; Fahad Shahbaz Khan

doi:10.1109/ICCV51070.2023.01729

Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation

Nian Liu, Kepan Nan, Wangbo Zhao, Yuanwei Liu, Xiwen Yao, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Junwei Han, Fahad Shahbaz Khan

自动化学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

12 引用（Scopus）

摘要

Few-Shot Video Object Segmentation (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images. However, this task was seldom explored. In this work, based on IPMT, a state-of-the-art few-shot image segmentation method that combines external support guidance information with adaptive query guidance cues, we propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data. We decompose the query video information into a clip prototype and a memory prototype for capturing local and long-term internal temporal guidance, respectively. Frame prototypes are further used for each frame independently to handle fine-grained adaptive guidance and enable bidirectional clip-frame prototype communication. To reduce the influence of noisy memory, we propose to leverage the structural similarity relation among different predicted regions and the support for selecting reliable memory frames. Furthermore, a new segmentation loss is also proposed to enhance the category discriminability of the learned prototypes. Experimental results demonstrate that our proposed video IPMT model significantly outperforms previous models on two benchmark datasets. Code is available at https://github.com/nankepan/VIPMT.

源语言	英语
主期刊名	Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
出版商	Institute of Electrical and Electronics Engineers Inc.
页	18816-18825
页数	10
ISBN（电子版）	9798350307184
DOI	https://doi.org/10.1109/ICCV51070.2023.01729
出版状态	已出版 - 2023
活动	2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 - Paris, 法国期限: 2 10月 2023 → 6 10月 2023

出版系列

姓名	Proceedings of the IEEE International Conference on Computer Vision
ISSN（印刷版）	1550-5499

会议

会议	2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
国家/地区	法国
市	Paris
时期	2/10/23 → 6/10/23

访问文件

10.1109/ICCV51070.2023.01729

其它文件与链接

链接到 Scopus 的出版物

引用此

Liu, N., Nan, K., Zhao, W., Liu, Y., Yao, X., Khan, S., Cholakkal, H., Anwer, R. M., Han, J., & Khan, F. S. (2023). Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation. 在 Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 (页码 18816-18825). (Proceedings of the IEEE International Conference on Computer Vision). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCV51070.2023.01729

@inproceedings{dd5c3ebbe223428f801ce6234ed17f2f,

title = "Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation",

abstract = "Few-Shot Video Object Segmentation (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images. However, this task was seldom explored. In this work, based on IPMT, a state-of-the-art few-shot image segmentation method that combines external support guidance information with adaptive query guidance cues, we propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data. We decompose the query video information into a clip prototype and a memory prototype for capturing local and long-term internal temporal guidance, respectively. Frame prototypes are further used for each frame independently to handle fine-grained adaptive guidance and enable bidirectional clip-frame prototype communication. To reduce the influence of noisy memory, we propose to leverage the structural similarity relation among different predicted regions and the support for selecting reliable memory frames. Furthermore, a new segmentation loss is also proposed to enhance the category discriminability of the learned prototypes. Experimental results demonstrate that our proposed video IPMT model significantly outperforms previous models on two benchmark datasets. Code is available at https://github.com/nankepan/VIPMT.",

author = "Nian Liu and Kepan Nan and Wangbo Zhao and Yuanwei Liu and Xiwen Yao and Salman Khan and Hisham Cholakkal and Anwer, {Rao Muhammad} and Junwei Han and Khan, {Fahad Shahbaz}",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 ; Conference date: 02-10-2023 Through 06-10-2023",

year = "2023",

doi = "10.1109/ICCV51070.2023.01729",

language = "英语",

series = "Proceedings of the IEEE International Conference on Computer Vision",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "18816--18825",

booktitle = "Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023",

}

Liu, N, Nan, K, Zhao, W, Liu, Y, Yao, X, Khan, S, Cholakkal, H, Anwer, RM, Han, J & Khan, FS 2023, Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation. 在 Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Proceedings of the IEEE International Conference on Computer Vision, Institute of Electrical and Electronics Engineers Inc., 页码 18816-18825, 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, 法国, 2/10/23. https://doi.org/10.1109/ICCV51070.2023.01729

Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation. / Liu, Nian; Nan, Kepan; Zhao, Wangbo 等.
Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 18816-18825 (Proceedings of the IEEE International Conference on Computer Vision).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation

AU - Liu, Nian

AU - Nan, Kepan

AU - Zhao, Wangbo

AU - Liu, Yuanwei

AU - Yao, Xiwen

AU - Khan, Salman

AU - Cholakkal, Hisham

AU - Anwer, Rao Muhammad

AU - Han, Junwei

AU - Khan, Fahad Shahbaz

PY - 2023

Y1 - 2023

N2 - Few-Shot Video Object Segmentation (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images. However, this task was seldom explored. In this work, based on IPMT, a state-of-the-art few-shot image segmentation method that combines external support guidance information with adaptive query guidance cues, we propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data. We decompose the query video information into a clip prototype and a memory prototype for capturing local and long-term internal temporal guidance, respectively. Frame prototypes are further used for each frame independently to handle fine-grained adaptive guidance and enable bidirectional clip-frame prototype communication. To reduce the influence of noisy memory, we propose to leverage the structural similarity relation among different predicted regions and the support for selecting reliable memory frames. Furthermore, a new segmentation loss is also proposed to enhance the category discriminability of the learned prototypes. Experimental results demonstrate that our proposed video IPMT model significantly outperforms previous models on two benchmark datasets. Code is available at https://github.com/nankepan/VIPMT.

AB - Few-Shot Video Object Segmentation (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images. However, this task was seldom explored. In this work, based on IPMT, a state-of-the-art few-shot image segmentation method that combines external support guidance information with adaptive query guidance cues, we propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data. We decompose the query video information into a clip prototype and a memory prototype for capturing local and long-term internal temporal guidance, respectively. Frame prototypes are further used for each frame independently to handle fine-grained adaptive guidance and enable bidirectional clip-frame prototype communication. To reduce the influence of noisy memory, we propose to leverage the structural similarity relation among different predicted regions and the support for selecting reliable memory frames. Furthermore, a new segmentation loss is also proposed to enhance the category discriminability of the learned prototypes. Experimental results demonstrate that our proposed video IPMT model significantly outperforms previous models on two benchmark datasets. Code is available at https://github.com/nankepan/VIPMT.

UR - http://www.scopus.com/inward/record.url?scp=85185876417&partnerID=8YFLogxK

U2 - 10.1109/ICCV51070.2023.01729

DO - 10.1109/ICCV51070.2023.01729

M3 - 会议稿件

AN - SCOPUS:85185876417

T3 - Proceedings of the IEEE International Conference on Computer Vision

SP - 18816

EP - 18825

BT - Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023

Y2 - 2 October 2023 through 6 October 2023

ER -

Liu N, Nan K, Zhao W, Liu Y, Yao X, Khan S 等. Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation. 在 Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Institute of Electrical and Electronics Engineers Inc. 2023. 页码 18816-18825. (Proceedings of the IEEE International Conference on Computer Vision). doi: 10.1109/ICCV51070.2023.01729

Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此