Skip to main navigation Skip to search Skip to main content

Learning Dual-Stream Conditional Concepts in Compositional Zero-Shot Learning

  • Qingsheng Wang
  • , Lingqiao Liu
  • , Chenchen Jing
  • , Peng Wang
  • , Yanning Zhang
  • , Chunhua Shen

Research output: Contribution to journalArticlepeer-review

Abstract

Compositional Zero-Shot Learning (CZSL) aims to recognize unseen compositional concepts composed of seen single concepts. One of the problems of CZSL is to model attributes interacting with objects and objects interacting with attributes. In this work, we focus on this problem and propose Dual-Stream Conditional Network (DSCNet) that learns dual-stream conditional concepts as a solution, where the conditional visual and semantic embeddings of attributes and objects are learned. First, we argue that the condition of the attribute or object is supposed to contain the recognized object and input image, or the recognized attribute and input image. Next, for each concept which can either be an attribute or object, in the semantic stream, we propose to encode the recognized object or attribute semantic features and the input image visual features as the encoded condition, which is then injected into all concept semantic embeddings by a semantic cross encoder to acquire conditional semantic embeddings. In the visual stream, the conditional attribute or object visual embeddings are acquired by injecting the semantic features of the recognized object or attribute into the mapped attribute or object visual features. Experimental results on CZSL benchmarks demonstrate the superiority of our proposed method.

Original languageEnglish
Pages (from-to)10076-10093
Number of pages18
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume47
Issue number11
DOIs
StatePublished - 2025

Keywords

  • Compositional zero-shot learning
  • compositional generalization
  • tuning soft prompts
  • zero-shot learning

Fingerprint

Dive into the research topics of 'Learning Dual-Stream Conditional Concepts in Compositional Zero-Shot Learning'. Together they form a unique fingerprint.

Cite this