TY - JOUR
T1 - Discriminative and Robust Attribute Alignment for Zero-Shot Learning
AU - Cheng, De
AU - Wang, Gerong
AU - Wang, Nannan
AU - Zhang, Dingwen
AU - Zhang, Qiang
AU - Gao, Xinbo
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2023/8/1
Y1 - 2023/8/1
N2 - Zero-shot learning (ZSL) aims to learn models that can recognize images of semantically related unseen categories, through transferring attribute-based knowledge learned from training data of seen classes to unseen testing data. As visual attributes play a vital role in ZSL, recent embedding-based methods usually focus on learning a compatibility function between the visual representation and the class semantic attributes. While in this work, in addition to simply learning the region embedding of different semantic attributes to maintain the generalization capability of the learned model, we further consider to improve the discrimination power of the learned visual features themselves by contrastive embedding. It exploits both the class-wise and instance-wise supervision for GZSL, under the attribute guided weakly supervised representation learning framework. To further improve the robustness of the ZSL model, we also propose to train the model under the consistency regularization constraint, through taking full advantages of self-supervised signals of the image under various perturbed augmentation situations, which could make the model robust to some occluded or un-related attribute regions. Extensive experimental results demonstrate the effectiveness of the proposed ZSL method, achieving superior performances to state-of-the-art methods on three widely-used benchmark datasets, namely CUB, SUN, and AWA2. Our source code is released at https://github.com/KORIYN/CC-ZSL.
AB - Zero-shot learning (ZSL) aims to learn models that can recognize images of semantically related unseen categories, through transferring attribute-based knowledge learned from training data of seen classes to unseen testing data. As visual attributes play a vital role in ZSL, recent embedding-based methods usually focus on learning a compatibility function between the visual representation and the class semantic attributes. While in this work, in addition to simply learning the region embedding of different semantic attributes to maintain the generalization capability of the learned model, we further consider to improve the discrimination power of the learned visual features themselves by contrastive embedding. It exploits both the class-wise and instance-wise supervision for GZSL, under the attribute guided weakly supervised representation learning framework. To further improve the robustness of the ZSL model, we also propose to train the model under the consistency regularization constraint, through taking full advantages of self-supervised signals of the image under various perturbed augmentation situations, which could make the model robust to some occluded or un-related attribute regions. Extensive experimental results demonstrate the effectiveness of the proposed ZSL method, achieving superior performances to state-of-the-art methods on three widely-used benchmark datasets, namely CUB, SUN, and AWA2. Our source code is released at https://github.com/KORIYN/CC-ZSL.
KW - Zero-shot learning
KW - attribute alignment
KW - consistency regularization
KW - contrastive learning
UR - http://www.scopus.com/inward/record.url?scp=85148434527&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2023.3243205
DO - 10.1109/TCSVT.2023.3243205
M3 - 文章
AN - SCOPUS:85148434527
SN - 1051-8215
VL - 33
SP - 4244
EP - 4256
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 8
ER -