VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning

Ziyang Luo; Nian Liu; Wangbo Zhao; Xuguang Yang; Dingwen Zhang; Deng Ping Fan; Fahad Khan; Junwei Han

doi:10.1109/CVPR52733.2024.01625

VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning

Ziyang Luo, Nian Liu, Wangbo Zhao, Xuguang Yang, Dingwen Zhang, Deng Ping Fan, Fahad Khan, Junwei Han

自动化学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

43 引用（Scopus）

摘要

Salient object detection (SOD) and camouflaged object detection (COD) are related yet distinct binary mapping tasks. These tasks involve multiple modalities, sharing commonalities and unique cues. Existing research often employs intricate task-specific specialist models, potentially leading to redundancy and suboptimal results. We introduce VS-Code, a generalist model with novel 2D prompt learning, to jointly address four SOD tasks and three COD tasks. We utilize VST as the foundation model and introduce 2D prompts within the encoder-decoder architecture to learn domain and task-specific knowledge on two separate dimensions. A prompt discrimination loss helps disentangle peculiarities to benefit model optimization. VSCode outperforms state-of-the-art methods across six tasks on 26 datasets and exhibits zero-shot generalization to unseen tasks by combining 2D prompts, such as RGB-D COD. Source code has been available at https://github.com/Sssssuperior/VSCode.

源语言	英语
主期刊名	Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
出版商	IEEE Computer Society
页	17169-17180
页数	12
ISBN（电子版）	9798350353006
DOI	https://doi.org/10.1109/CVPR52733.2024.01625
出版状态	已出版 - 2024
活动	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, 美国期限: 16 6月 2024 → 22 6月 2024

出版系列

姓名	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN（印刷版）	1063-6919

会议

会议	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
国家/地区	美国
市	Seattle
时期	16/06/24 → 22/06/24

访问文件

10.1109/CVPR52733.2024.01625

其它文件与链接

链接到 Scopus 的出版物

引用此

Luo, Z., Liu, N., Zhao, W., Yang, X., Zhang, D., Fan, D. P., Khan, F., & Han, J. (2024). VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning. 在 Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 (页码 17169-17180). (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). IEEE Computer Society. https://doi.org/10.1109/CVPR52733.2024.01625

Luo, Ziyang ; Liu, Nian ; Zhao, Wangbo 等. / VSCode : General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning. Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024. IEEE Computer Society, 2024. 页码 17169-17180 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

@inproceedings{51faf4f96cbc49d1a492c10bf79743ce,

title = "VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning",

abstract = "Salient object detection (SOD) and camouflaged object detection (COD) are related yet distinct binary mapping tasks. These tasks involve multiple modalities, sharing commonalities and unique cues. Existing research often employs intricate task-specific specialist models, potentially leading to redundancy and suboptimal results. We introduce VS-Code, a generalist model with novel 2D prompt learning, to jointly address four SOD tasks and three COD tasks. We utilize VST as the foundation model and introduce 2D prompts within the encoder-decoder architecture to learn domain and task-specific knowledge on two separate dimensions. A prompt discrimination loss helps disentangle peculiarities to benefit model optimization. VSCode outperforms state-of-the-art methods across six tasks on 26 datasets and exhibits zero-shot generalization to unseen tasks by combining 2D prompts, such as RGB-D COD. Source code has been available at https://github.com/Sssssuperior/VSCode.",

author = "Ziyang Luo and Nian Liu and Wangbo Zhao and Xuguang Yang and Dingwen Zhang and Fan, {Deng Ping} and Fahad Khan and Junwei Han",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 ; Conference date: 16-06-2024 Through 22-06-2024",

year = "2024",

doi = "10.1109/CVPR52733.2024.01625",

language = "英语",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "17169--17180",

booktitle = "Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024",

}

Luo, Z, Liu, N, Zhao, W, Yang, X, Zhang, D, Fan, DP, Khan, F & Han, J 2024, VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning. 在 Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, 页码 17169-17180, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, 美国, 16/06/24. https://doi.org/10.1109/CVPR52733.2024.01625

VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning. / Luo, Ziyang; Liu, Nian; Zhao, Wangbo 等.
Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024. IEEE Computer Society, 2024. 页码 17169-17180 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - VSCode

T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

AU - Luo, Ziyang

AU - Liu, Nian

AU - Zhao, Wangbo

AU - Yang, Xuguang

AU - Zhang, Dingwen

AU - Fan, Deng Ping

AU - Khan, Fahad

AU - Han, Junwei

PY - 2024

Y1 - 2024

N2 - Salient object detection (SOD) and camouflaged object detection (COD) are related yet distinct binary mapping tasks. These tasks involve multiple modalities, sharing commonalities and unique cues. Existing research often employs intricate task-specific specialist models, potentially leading to redundancy and suboptimal results. We introduce VS-Code, a generalist model with novel 2D prompt learning, to jointly address four SOD tasks and three COD tasks. We utilize VST as the foundation model and introduce 2D prompts within the encoder-decoder architecture to learn domain and task-specific knowledge on two separate dimensions. A prompt discrimination loss helps disentangle peculiarities to benefit model optimization. VSCode outperforms state-of-the-art methods across six tasks on 26 datasets and exhibits zero-shot generalization to unseen tasks by combining 2D prompts, such as RGB-D COD. Source code has been available at https://github.com/Sssssuperior/VSCode.

AB - Salient object detection (SOD) and camouflaged object detection (COD) are related yet distinct binary mapping tasks. These tasks involve multiple modalities, sharing commonalities and unique cues. Existing research often employs intricate task-specific specialist models, potentially leading to redundancy and suboptimal results. We introduce VS-Code, a generalist model with novel 2D prompt learning, to jointly address four SOD tasks and three COD tasks. We utilize VST as the foundation model and introduce 2D prompts within the encoder-decoder architecture to learn domain and task-specific knowledge on two separate dimensions. A prompt discrimination loss helps disentangle peculiarities to benefit model optimization. VSCode outperforms state-of-the-art methods across six tasks on 26 datasets and exhibits zero-shot generalization to unseen tasks by combining 2D prompts, such as RGB-D COD. Source code has been available at https://github.com/Sssssuperior/VSCode.

UR - http://www.scopus.com/inward/record.url?scp=85201758870&partnerID=8YFLogxK

U2 - 10.1109/CVPR52733.2024.01625

DO - 10.1109/CVPR52733.2024.01625

M3 - 会议稿件

AN - SCOPUS:85201758870

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 17169

EP - 17180

BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

PB - IEEE Computer Society

Y2 - 16 June 2024 through 22 June 2024

ER -

Luo Z, Liu N, Zhao W, Yang X, Zhang D, Fan DP 等. VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning. 在 Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024. IEEE Computer Society. 2024. 页码 17169-17180. (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). doi: 10.1109/CVPR52733.2024.01625

VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此