TY - GEN
T1 - Dual-Branch Task Residual Enhancement with Parameter-Free Attention for Zero-Shot Multi-label Image Recognition
AU - Zhang, Shizhou
AU - Dang, Kairui
AU - Cheng, De
AU - Xing, Yinghui
AU - Wu, Qirui
AU - Kong, Dexuan
AU - Zhang, Yanning
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Zero-shot multi-label image recognition involves the task of recognizing multi-label images while “zero” visual information has been input into the model during training. Recently, with the emergence of large pre-trained vision-language model, the visual and semantic features can be well aligned after being trained with billions of image-text pairs collected from the internet. In this paper, by utilizing the pre-trained CLIP model, we propose a dual-branch task residual enhancement with parameter-free attention module that enhances interaction of inter-modal information to tackle the problem of multi-label image recognition. The method employs a dual-branch structure, including global and local branches. The local branch mitigates global feature dominance, improving image content understanding ability of local regions. Our method shows superiority in zero-shot multi-label learning on VOC2007, MS-COCO, and NUS-WIDE datasets, surpassing the state-of-the-art methods. Additionally, it also has excellent performance in partial label settings. Code is available in the supplementary materials.
AB - Zero-shot multi-label image recognition involves the task of recognizing multi-label images while “zero” visual information has been input into the model during training. Recently, with the emergence of large pre-trained vision-language model, the visual and semantic features can be well aligned after being trained with billions of image-text pairs collected from the internet. In this paper, by utilizing the pre-trained CLIP model, we propose a dual-branch task residual enhancement with parameter-free attention module that enhances interaction of inter-modal information to tackle the problem of multi-label image recognition. The method employs a dual-branch structure, including global and local branches. The local branch mitigates global feature dominance, improving image content understanding ability of local regions. Our method shows superiority in zero-shot multi-label learning on VOC2007, MS-COCO, and NUS-WIDE datasets, surpassing the state-of-the-art methods. Additionally, it also has excellent performance in partial label settings. Code is available in the supplementary materials.
KW - Dual-branch
KW - Multi-label
KW - Task Residual
KW - Zero-shot
UR - http://www.scopus.com/inward/record.url?scp=85212265829&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-78312-8_11
DO - 10.1007/978-3-031-78312-8_11
M3 - 会议稿件
AN - SCOPUS:85212265829
SN - 9783031783111
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 160
EP - 171
BT - Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings
A2 - Antonacopoulos, Apostolos
A2 - Chaudhuri, Subhasis
A2 - Chellappa, Rama
A2 - Liu, Cheng-Lin
A2 - Bhattacharya, Saumik
A2 - Pal, Umapada
PB - Springer Science and Business Media Deutschland GmbH
T2 - 27th International Conference on Pattern Recognition, ICPR 2024
Y2 - 1 December 2024 through 5 December 2024
ER -