Discriminative and Robust Attribute Alignment for Zero-Shot Learning

De Cheng; Gerong Wang; Nannan Wang; Dingwen Zhang; Qiang Zhang; Xinbo Gao

doi:10.1109/TCSVT.2023.3243205

Discriminative and Robust Attribute Alignment for Zero-Shot Learning

De Cheng, Gerong Wang, Nannan Wang, Dingwen Zhang, Qiang Zhang, Xinbo Gao

School of Automation

Research output: Contribution to journal › Article › peer-review

29 Scopus citations

Abstract

Zero-shot learning (ZSL) aims to learn models that can recognize images of semantically related unseen categories, through transferring attribute-based knowledge learned from training data of seen classes to unseen testing data. As visual attributes play a vital role in ZSL, recent embedding-based methods usually focus on learning a compatibility function between the visual representation and the class semantic attributes. While in this work, in addition to simply learning the region embedding of different semantic attributes to maintain the generalization capability of the learned model, we further consider to improve the discrimination power of the learned visual features themselves by contrastive embedding. It exploits both the class-wise and instance-wise supervision for GZSL, under the attribute guided weakly supervised representation learning framework. To further improve the robustness of the ZSL model, we also propose to train the model under the consistency regularization constraint, through taking full advantages of self-supervised signals of the image under various perturbed augmentation situations, which could make the model robust to some occluded or un-related attribute regions. Extensive experimental results demonstrate the effectiveness of the proposed ZSL method, achieving superior performances to state-of-the-art methods on three widely-used benchmark datasets, namely CUB, SUN, and AWA2. Our source code is released at https://github.com/KORIYN/CC-ZSL.

Original language	English
Pages (from-to)	4244-4256
Number of pages	13
Journal	IEEE Transactions on Circuits and Systems for Video Technology
Volume	33
Issue number	8
DOIs	https://doi.org/10.1109/TCSVT.2023.3243205
State	Published - 1 Aug 2023

Keywords

Zero-shot learning
attribute alignment
consistency regularization
contrastive learning

Access to Document

10.1109/TCSVT.2023.3243205

Cite this

@article{a7acd015104e415fbb23dd829a913d0e,

title = "Discriminative and Robust Attribute Alignment for Zero-Shot Learning",

abstract = "Zero-shot learning (ZSL) aims to learn models that can recognize images of semantically related unseen categories, through transferring attribute-based knowledge learned from training data of seen classes to unseen testing data. As visual attributes play a vital role in ZSL, recent embedding-based methods usually focus on learning a compatibility function between the visual representation and the class semantic attributes. While in this work, in addition to simply learning the region embedding of different semantic attributes to maintain the generalization capability of the learned model, we further consider to improve the discrimination power of the learned visual features themselves by contrastive embedding. It exploits both the class-wise and instance-wise supervision for GZSL, under the attribute guided weakly supervised representation learning framework. To further improve the robustness of the ZSL model, we also propose to train the model under the consistency regularization constraint, through taking full advantages of self-supervised signals of the image under various perturbed augmentation situations, which could make the model robust to some occluded or un-related attribute regions. Extensive experimental results demonstrate the effectiveness of the proposed ZSL method, achieving superior performances to state-of-the-art methods on three widely-used benchmark datasets, namely CUB, SUN, and AWA2. Our source code is released at https://github.com/KORIYN/CC-ZSL.",

keywords = "Zero-shot learning, attribute alignment, consistency regularization, contrastive learning",

author = "De Cheng and Gerong Wang and Nannan Wang and Dingwen Zhang and Qiang Zhang and Xinbo Gao",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2023",

month = aug,

day = "1",

doi = "10.1109/TCSVT.2023.3243205",

language = "英语",

volume = "33",

pages = "4244--4256",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "8",

}

TY - JOUR

T1 - Discriminative and Robust Attribute Alignment for Zero-Shot Learning

AU - Cheng, De

AU - Wang, Gerong

AU - Wang, Nannan

AU - Zhang, Dingwen

AU - Zhang, Qiang

AU - Gao, Xinbo

PY - 2023/8/1

Y1 - 2023/8/1

N2 - Zero-shot learning (ZSL) aims to learn models that can recognize images of semantically related unseen categories, through transferring attribute-based knowledge learned from training data of seen classes to unseen testing data. As visual attributes play a vital role in ZSL, recent embedding-based methods usually focus on learning a compatibility function between the visual representation and the class semantic attributes. While in this work, in addition to simply learning the region embedding of different semantic attributes to maintain the generalization capability of the learned model, we further consider to improve the discrimination power of the learned visual features themselves by contrastive embedding. It exploits both the class-wise and instance-wise supervision for GZSL, under the attribute guided weakly supervised representation learning framework. To further improve the robustness of the ZSL model, we also propose to train the model under the consistency regularization constraint, through taking full advantages of self-supervised signals of the image under various perturbed augmentation situations, which could make the model robust to some occluded or un-related attribute regions. Extensive experimental results demonstrate the effectiveness of the proposed ZSL method, achieving superior performances to state-of-the-art methods on three widely-used benchmark datasets, namely CUB, SUN, and AWA2. Our source code is released at https://github.com/KORIYN/CC-ZSL.

AB - Zero-shot learning (ZSL) aims to learn models that can recognize images of semantically related unseen categories, through transferring attribute-based knowledge learned from training data of seen classes to unseen testing data. As visual attributes play a vital role in ZSL, recent embedding-based methods usually focus on learning a compatibility function between the visual representation and the class semantic attributes. While in this work, in addition to simply learning the region embedding of different semantic attributes to maintain the generalization capability of the learned model, we further consider to improve the discrimination power of the learned visual features themselves by contrastive embedding. It exploits both the class-wise and instance-wise supervision for GZSL, under the attribute guided weakly supervised representation learning framework. To further improve the robustness of the ZSL model, we also propose to train the model under the consistency regularization constraint, through taking full advantages of self-supervised signals of the image under various perturbed augmentation situations, which could make the model robust to some occluded or un-related attribute regions. Extensive experimental results demonstrate the effectiveness of the proposed ZSL method, achieving superior performances to state-of-the-art methods on three widely-used benchmark datasets, namely CUB, SUN, and AWA2. Our source code is released at https://github.com/KORIYN/CC-ZSL.

KW - Zero-shot learning

KW - attribute alignment

KW - consistency regularization

KW - contrastive learning

UR - http://www.scopus.com/inward/record.url?scp=85148434527&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2023.3243205

DO - 10.1109/TCSVT.2023.3243205

M3 - 文章

AN - SCOPUS:85148434527

SN - 1051-8215

VL - 33

SP - 4244

EP - 4256

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 8

ER -

Discriminative and Robust Attribute Alignment for Zero-Shot Learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this