Dual-Branch Task Residual Enhancement with Parameter-Free Attention for Zero-Shot Multi-label Image Recognition

Shizhou Zhang; Kairui Dang; De Cheng; Yinghui Xing; Qirui Wu; Dexuan Kong; Yanning Zhang

doi:10.1007/978-3-031-78312-8_11

Dual-Branch Task Residual Enhancement with Parameter-Free Attention for Zero-Shot Multi-label Image Recognition

Shizhou Zhang, Kairui Dang, De Cheng, Yinghui Xing, Qirui Wu, Dexuan Kong, Yanning Zhang

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Zero-shot multi-label image recognition involves the task of recognizing multi-label images while “zero” visual information has been input into the model during training. Recently, with the emergence of large pre-trained vision-language model, the visual and semantic features can be well aligned after being trained with billions of image-text pairs collected from the internet. In this paper, by utilizing the pre-trained CLIP model, we propose a dual-branch task residual enhancement with parameter-free attention module that enhances interaction of inter-modal information to tackle the problem of multi-label image recognition. The method employs a dual-branch structure, including global and local branches. The local branch mitigates global feature dominance, improving image content understanding ability of local regions. Our method shows superiority in zero-shot multi-label learning on VOC2007, MS-COCO, and NUS-WIDE datasets, surpassing the state-of-the-art methods. Additionally, it also has excellent performance in partial label settings. Code is available in the supplementary materials.

Original language	English
Title of host publication	Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings
Editors	Apostolos Antonacopoulos, Subhasis Chaudhuri, Rama Chellappa, Cheng-Lin Liu, Saumik Bhattacharya, Umapada Pal
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	160-171
Number of pages	12
ISBN (Print)	9783031783111
DOIs	https://doi.org/10.1007/978-3-031-78312-8_11
State	Published - 2025
Event	27th International Conference on Pattern Recognition, ICPR 2024 - Kolkata, India Duration: 1 Dec 2024 → 5 Dec 2024

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	15322 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	27th International Conference on Pattern Recognition, ICPR 2024
Country/Territory	India
City	Kolkata
Period	1/12/24 → 5/12/24

Keywords

Dual-branch
Multi-label
Task Residual
Zero-shot

Access to Document

10.1007/978-3-031-78312-8_11

Cite this

Zhang, S., Dang, K., Cheng, D., Xing, Y., Wu, Q., Kong, D., & Zhang, Y. (2025). Dual-Branch Task Residual Enhancement with Parameter-Free Attention for Zero-Shot Multi-label Image Recognition. In A. Antonacopoulos, S. Chaudhuri, R. Chellappa, C.-L. Liu, S. Bhattacharya, & U. Pal (Eds.), Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings (pp. 160-171). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 15322 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-78312-8_11

Zhang, Shizhou ; Dang, Kairui ; Cheng, De et al. / Dual-Branch Task Residual Enhancement with Parameter-Free Attention for Zero-Shot Multi-label Image Recognition. Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings. editor / Apostolos Antonacopoulos ; Subhasis Chaudhuri ; Rama Chellappa ; Cheng-Lin Liu ; Saumik Bhattacharya ; Umapada Pal. Springer Science and Business Media Deutschland GmbH, 2025. pp. 160-171 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{54f451b30f8e4b3dbfd2a81ee765bfc8,

title = "Dual-Branch Task Residual Enhancement with Parameter-Free Attention for Zero-Shot Multi-label Image Recognition",

abstract = "Zero-shot multi-label image recognition involves the task of recognizing multi-label images while “zero” visual information has been input into the model during training. Recently, with the emergence of large pre-trained vision-language model, the visual and semantic features can be well aligned after being trained with billions of image-text pairs collected from the internet. In this paper, by utilizing the pre-trained CLIP model, we propose a dual-branch task residual enhancement with parameter-free attention module that enhances interaction of inter-modal information to tackle the problem of multi-label image recognition. The method employs a dual-branch structure, including global and local branches. The local branch mitigates global feature dominance, improving image content understanding ability of local regions. Our method shows superiority in zero-shot multi-label learning on VOC2007, MS-COCO, and NUS-WIDE datasets, surpassing the state-of-the-art methods. Additionally, it also has excellent performance in partial label settings. Code is available in the supplementary materials.",

keywords = "Dual-branch, Multi-label, Task Residual, Zero-shot",

author = "Shizhou Zhang and Kairui Dang and De Cheng and Yinghui Xing and Qirui Wu and Dexuan Kong and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.; 27th International Conference on Pattern Recognition, ICPR 2024 ; Conference date: 01-12-2024 Through 05-12-2024",

year = "2025",

doi = "10.1007/978-3-031-78312-8_11",

language = "英语",

isbn = "9783031783111",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "160--171",

editor = "Apostolos Antonacopoulos and Subhasis Chaudhuri and Rama Chellappa and Cheng-Lin Liu and Saumik Bhattacharya and Umapada Pal",

booktitle = "Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings",

}

Zhang, S, Dang, K, Cheng, D, Xing, Y, Wu, Q, Kong, D & Zhang, Y 2025, Dual-Branch Task Residual Enhancement with Parameter-Free Attention for Zero-Shot Multi-label Image Recognition. in A Antonacopoulos, S Chaudhuri, R Chellappa, C-L Liu, S Bhattacharya & U Pal (eds), Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 15322 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 160-171, 27th International Conference on Pattern Recognition, ICPR 2024, Kolkata, India, 1/12/24. https://doi.org/10.1007/978-3-031-78312-8_11

Dual-Branch Task Residual Enhancement with Parameter-Free Attention for Zero-Shot Multi-label Image Recognition. / Zhang, Shizhou; Dang, Kairui; Cheng, De et al.
Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings. ed. / Apostolos Antonacopoulos; Subhasis Chaudhuri; Rama Chellappa; Cheng-Lin Liu; Saumik Bhattacharya; Umapada Pal. Springer Science and Business Media Deutschland GmbH, 2025. p. 160-171 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 15322 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Dual-Branch Task Residual Enhancement with Parameter-Free Attention for Zero-Shot Multi-label Image Recognition

AU - Zhang, Shizhou

AU - Dang, Kairui

AU - Cheng, De

AU - Xing, Yinghui

AU - Wu, Qirui

AU - Kong, Dexuan

AU - Zhang, Yanning

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

PY - 2025

Y1 - 2025

N2 - Zero-shot multi-label image recognition involves the task of recognizing multi-label images while “zero” visual information has been input into the model during training. Recently, with the emergence of large pre-trained vision-language model, the visual and semantic features can be well aligned after being trained with billions of image-text pairs collected from the internet. In this paper, by utilizing the pre-trained CLIP model, we propose a dual-branch task residual enhancement with parameter-free attention module that enhances interaction of inter-modal information to tackle the problem of multi-label image recognition. The method employs a dual-branch structure, including global and local branches. The local branch mitigates global feature dominance, improving image content understanding ability of local regions. Our method shows superiority in zero-shot multi-label learning on VOC2007, MS-COCO, and NUS-WIDE datasets, surpassing the state-of-the-art methods. Additionally, it also has excellent performance in partial label settings. Code is available in the supplementary materials.

AB - Zero-shot multi-label image recognition involves the task of recognizing multi-label images while “zero” visual information has been input into the model during training. Recently, with the emergence of large pre-trained vision-language model, the visual and semantic features can be well aligned after being trained with billions of image-text pairs collected from the internet. In this paper, by utilizing the pre-trained CLIP model, we propose a dual-branch task residual enhancement with parameter-free attention module that enhances interaction of inter-modal information to tackle the problem of multi-label image recognition. The method employs a dual-branch structure, including global and local branches. The local branch mitigates global feature dominance, improving image content understanding ability of local regions. Our method shows superiority in zero-shot multi-label learning on VOC2007, MS-COCO, and NUS-WIDE datasets, surpassing the state-of-the-art methods. Additionally, it also has excellent performance in partial label settings. Code is available in the supplementary materials.

KW - Dual-branch

KW - Multi-label

KW - Task Residual

KW - Zero-shot

UR - http://www.scopus.com/inward/record.url?scp=85212265829&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-78312-8_11

DO - 10.1007/978-3-031-78312-8_11

M3 - 会议稿件

AN - SCOPUS:85212265829

SN - 9783031783111

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 160

EP - 171

BT - Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings

A2 - Antonacopoulos, Apostolos

A2 - Chaudhuri, Subhasis

A2 - Chellappa, Rama

A2 - Liu, Cheng-Lin

A2 - Bhattacharya, Saumik

A2 - Pal, Umapada

PB - Springer Science and Business Media Deutschland GmbH

T2 - 27th International Conference on Pattern Recognition, ICPR 2024

Y2 - 1 December 2024 through 5 December 2024

ER -

Zhang S, Dang K, Cheng D, Xing Y, Wu Q, Kong D et al. Dual-Branch Task Residual Enhancement with Parameter-Free Attention for Zero-Shot Multi-label Image Recognition. In Antonacopoulos A, Chaudhuri S, Chellappa R, Liu CL, Bhattacharya S, Pal U, editors, Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings. Springer Science and Business Media Deutschland GmbH. 2025. p. 160-171. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-78312-8_11

Dual-Branch Task Residual Enhancement with Parameter-Free Attention for Zero-Shot Multi-label Image Recognition

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this