Towards Effective Deep Embedding for Zero-Shot Learning

Lei Zhang; Peng Wang; Lingqiao Liu; Chunhua Shen; Wei Wei; Yanning Zhang; Anton Van Den Hengel

doi:10.1109/TCSVT.2020.2984666

Towards Effective Deep Embedding for Zero-Shot Learning

Lei Zhang, Peng Wang, Lingqiao Liu, Chunhua Shen, Wei Wei, Yanning Zhang, Anton Van Den Hengel

School of Computer Science

Research output: Contribution to journal › Article › peer-review

67 Scopus citations

Abstract

Zero-shot learning (ZSL) can be formulated as a cross-domain matching problem: after being projected into a joint embedding space, a visual sample will match against all candidate class-level semantic descriptions and be assigned to the nearest class. In this process, the embedding space underpins the success of such matching and is crucial for ZSL. In this paper, we conduct an in-depth study on the construction of embedding space for ZSL and posit that an ideal embedding space should satisfy two criteria: intra-class compactness and inter-class separability. While the former encourages the embeddings of visual samples of one class to distribute tightly close to the semantic description embedding of this class, the latter requires embeddings from different classes to be well separated from each other. Towards this goal, we present a simple but effective two-branch network to simultaneously map semantic descriptions and visual samples into a joint space, on which visual embeddings are forced to regress to their class-level semantic embeddings and the embeddings crossing classes are required to be distinguishable by a trainable classifier. Furthermore, we extend our method to a transductive setting to better handle the model bias problem in ZSL (i.e., samples from unseen classes tend to be categorized into seen classes) with minimal extra supervision. Specifically, we propose a pseudo labeling strategy to progressively incorporate the testing samples into the training process and thus balance the model between seen and unseen classes. Experimental results on five standard ZSL datasets show the superior performance of the proposed method and its transductive extension.

Original language	English
Article number	9051798
Pages (from-to)	2843-2852
Number of pages	10
Journal	IEEE Transactions on Circuits and Systems for Video Technology
Volume	30
Issue number	9
DOIs	https://doi.org/10.1109/TCSVT.2020.2984666
State	Published - Sep 2020

Keywords

Deep embedding
Deep neural network
Zero-shot learning

Access to Document

10.1109/TCSVT.2020.2984666

Cite this

@article{87bd9748e8754786928edeb9172d530c,

title = "Towards Effective Deep Embedding for Zero-Shot Learning",

abstract = "Zero-shot learning (ZSL) can be formulated as a cross-domain matching problem: after being projected into a joint embedding space, a visual sample will match against all candidate class-level semantic descriptions and be assigned to the nearest class. In this process, the embedding space underpins the success of such matching and is crucial for ZSL. In this paper, we conduct an in-depth study on the construction of embedding space for ZSL and posit that an ideal embedding space should satisfy two criteria: intra-class compactness and inter-class separability. While the former encourages the embeddings of visual samples of one class to distribute tightly close to the semantic description embedding of this class, the latter requires embeddings from different classes to be well separated from each other. Towards this goal, we present a simple but effective two-branch network to simultaneously map semantic descriptions and visual samples into a joint space, on which visual embeddings are forced to regress to their class-level semantic embeddings and the embeddings crossing classes are required to be distinguishable by a trainable classifier. Furthermore, we extend our method to a transductive setting to better handle the model bias problem in ZSL (i.e., samples from unseen classes tend to be categorized into seen classes) with minimal extra supervision. Specifically, we propose a pseudo labeling strategy to progressively incorporate the testing samples into the training process and thus balance the model between seen and unseen classes. Experimental results on five standard ZSL datasets show the superior performance of the proposed method and its transductive extension.",

keywords = "Deep embedding, Deep neural network, Zero-shot learning",

author = "Lei Zhang and Peng Wang and Lingqiao Liu and Chunhua Shen and Wei Wei and Yanning Zhang and {Van Den Hengel}, Anton",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2020",

month = sep,

doi = "10.1109/TCSVT.2020.2984666",

language = "英语",

volume = "30",

pages = "2843--2852",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "9",

}

TY - JOUR

T1 - Towards Effective Deep Embedding for Zero-Shot Learning

AU - Zhang, Lei

AU - Wang, Peng

AU - Liu, Lingqiao

AU - Shen, Chunhua

AU - Wei, Wei

AU - Zhang, Yanning

AU - Van Den Hengel, Anton

PY - 2020/9

Y1 - 2020/9

N2 - Zero-shot learning (ZSL) can be formulated as a cross-domain matching problem: after being projected into a joint embedding space, a visual sample will match against all candidate class-level semantic descriptions and be assigned to the nearest class. In this process, the embedding space underpins the success of such matching and is crucial for ZSL. In this paper, we conduct an in-depth study on the construction of embedding space for ZSL and posit that an ideal embedding space should satisfy two criteria: intra-class compactness and inter-class separability. While the former encourages the embeddings of visual samples of one class to distribute tightly close to the semantic description embedding of this class, the latter requires embeddings from different classes to be well separated from each other. Towards this goal, we present a simple but effective two-branch network to simultaneously map semantic descriptions and visual samples into a joint space, on which visual embeddings are forced to regress to their class-level semantic embeddings and the embeddings crossing classes are required to be distinguishable by a trainable classifier. Furthermore, we extend our method to a transductive setting to better handle the model bias problem in ZSL (i.e., samples from unseen classes tend to be categorized into seen classes) with minimal extra supervision. Specifically, we propose a pseudo labeling strategy to progressively incorporate the testing samples into the training process and thus balance the model between seen and unseen classes. Experimental results on five standard ZSL datasets show the superior performance of the proposed method and its transductive extension.

AB - Zero-shot learning (ZSL) can be formulated as a cross-domain matching problem: after being projected into a joint embedding space, a visual sample will match against all candidate class-level semantic descriptions and be assigned to the nearest class. In this process, the embedding space underpins the success of such matching and is crucial for ZSL. In this paper, we conduct an in-depth study on the construction of embedding space for ZSL and posit that an ideal embedding space should satisfy two criteria: intra-class compactness and inter-class separability. While the former encourages the embeddings of visual samples of one class to distribute tightly close to the semantic description embedding of this class, the latter requires embeddings from different classes to be well separated from each other. Towards this goal, we present a simple but effective two-branch network to simultaneously map semantic descriptions and visual samples into a joint space, on which visual embeddings are forced to regress to their class-level semantic embeddings and the embeddings crossing classes are required to be distinguishable by a trainable classifier. Furthermore, we extend our method to a transductive setting to better handle the model bias problem in ZSL (i.e., samples from unseen classes tend to be categorized into seen classes) with minimal extra supervision. Specifically, we propose a pseudo labeling strategy to progressively incorporate the testing samples into the training process and thus balance the model between seen and unseen classes. Experimental results on five standard ZSL datasets show the superior performance of the proposed method and its transductive extension.

KW - Deep embedding

KW - Deep neural network

KW - Zero-shot learning

UR - http://www.scopus.com/inward/record.url?scp=85091194103&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2020.2984666

DO - 10.1109/TCSVT.2020.2984666

M3 - 文章

AN - SCOPUS:85091194103

SN - 1051-8215

VL - 30

SP - 2843

EP - 2852

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 9

M1 - 9051798

ER -

Towards Effective Deep Embedding for Zero-Shot Learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this