Narrowing the variance of variational cross-encoder for cross-modal hashing

Dayong Tian; Yiqin Cao; Yiwen Wei; Deyun Zhou

doi:10.1007/s00530-023-01161-3

Narrowing the variance of variational cross-encoder for cross-modal hashing

Dayong Tian, Yiqin Cao, Yiwen Wei, Deyun Zhou

School of Electronics and Information

Research output: Contribution to journal › Article › peer-review

Abstract

Cross-modal hashing which embeds data to binary codes is an efficient tool for retrieving heterogeneous but correlated multimedia data. In real applications, the sizes of queries are much larger than that of the training set and the queries may be dissimilar to training data, which lays bare the shortage of generalization of deterministic models, such as cross-encoder and autoencoder. In this paper, we design a variational cross-encoder (VCE), a generative model, to tackle this problem. At the bottleneck layer, the VCE outputs distributions parameterized by means and variances. As VCE can generate diversified data using noises, the proposed model can perform better in testing data. Ideally, each distribution is expected to describe a category of data and samples of this distribution are expected to generate data in the same category. Under this expectation, the means and variances can be used as real codes for input data. However, the generated data generally are not belonging to the same category as the input data. Hence, we add a penalty term on variance output of VCE and use means as real codes for further generating hashing codes. Experiments on three widely used datasets validate the effectiveness of our method.

Original language	English
Pages (from-to)	3421-3430
Number of pages	10
Journal	Multimedia Systems
Volume	29
Issue number	6
DOIs	https://doi.org/10.1007/s00530-023-01161-3
State	Published - Dec 2023

Keywords

Approximate nearest neighbor search
Cross-modal
Hashing
Variational cross-encoder

Access to Document

10.1007/s00530-023-01161-3

Cite this

@article{5eea83a6af74442ab724d67a0a8b5f06,

title = "Narrowing the variance of variational cross-encoder for cross-modal hashing",

abstract = "Cross-modal hashing which embeds data to binary codes is an efficient tool for retrieving heterogeneous but correlated multimedia data. In real applications, the sizes of queries are much larger than that of the training set and the queries may be dissimilar to training data, which lays bare the shortage of generalization of deterministic models, such as cross-encoder and autoencoder. In this paper, we design a variational cross-encoder (VCE), a generative model, to tackle this problem. At the bottleneck layer, the VCE outputs distributions parameterized by means and variances. As VCE can generate diversified data using noises, the proposed model can perform better in testing data. Ideally, each distribution is expected to describe a category of data and samples of this distribution are expected to generate data in the same category. Under this expectation, the means and variances can be used as real codes for input data. However, the generated data generally are not belonging to the same category as the input data. Hence, we add a penalty term on variance output of VCE and use means as real codes for further generating hashing codes. Experiments on three widely used datasets validate the effectiveness of our method.",

keywords = "Approximate nearest neighbor search, Cross-modal, Hashing, Variational cross-encoder",

author = "Dayong Tian and Yiqin Cao and Yiwen Wei and Deyun Zhou",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.",

year = "2023",

month = dec,

doi = "10.1007/s00530-023-01161-3",

language = "英语",

volume = "29",

pages = "3421--3430",

journal = "Multimedia Systems",

issn = "0942-4962",

publisher = "Springer Verlag",

number = "6",

}

TY - JOUR

T1 - Narrowing the variance of variational cross-encoder for cross-modal hashing

AU - Tian, Dayong

AU - Cao, Yiqin

AU - Wei, Yiwen

AU - Zhou, Deyun

PY - 2023/12

Y1 - 2023/12

N2 - Cross-modal hashing which embeds data to binary codes is an efficient tool for retrieving heterogeneous but correlated multimedia data. In real applications, the sizes of queries are much larger than that of the training set and the queries may be dissimilar to training data, which lays bare the shortage of generalization of deterministic models, such as cross-encoder and autoencoder. In this paper, we design a variational cross-encoder (VCE), a generative model, to tackle this problem. At the bottleneck layer, the VCE outputs distributions parameterized by means and variances. As VCE can generate diversified data using noises, the proposed model can perform better in testing data. Ideally, each distribution is expected to describe a category of data and samples of this distribution are expected to generate data in the same category. Under this expectation, the means and variances can be used as real codes for input data. However, the generated data generally are not belonging to the same category as the input data. Hence, we add a penalty term on variance output of VCE and use means as real codes for further generating hashing codes. Experiments on three widely used datasets validate the effectiveness of our method.

AB - Cross-modal hashing which embeds data to binary codes is an efficient tool for retrieving heterogeneous but correlated multimedia data. In real applications, the sizes of queries are much larger than that of the training set and the queries may be dissimilar to training data, which lays bare the shortage of generalization of deterministic models, such as cross-encoder and autoencoder. In this paper, we design a variational cross-encoder (VCE), a generative model, to tackle this problem. At the bottleneck layer, the VCE outputs distributions parameterized by means and variances. As VCE can generate diversified data using noises, the proposed model can perform better in testing data. Ideally, each distribution is expected to describe a category of data and samples of this distribution are expected to generate data in the same category. Under this expectation, the means and variances can be used as real codes for input data. However, the generated data generally are not belonging to the same category as the input data. Hence, we add a penalty term on variance output of VCE and use means as real codes for further generating hashing codes. Experiments on three widely used datasets validate the effectiveness of our method.

KW - Approximate nearest neighbor search

KW - Cross-modal

KW - Hashing

KW - Variational cross-encoder

UR - http://www.scopus.com/inward/record.url?scp=85169167718&partnerID=8YFLogxK

U2 - 10.1007/s00530-023-01161-3

DO - 10.1007/s00530-023-01161-3

M3 - 文章

AN - SCOPUS:85169167718

SN - 0942-4962

VL - 29

SP - 3421

EP - 3430

JO - Multimedia Systems

JF - Multimedia Systems

IS - 6

ER -

Narrowing the variance of variational cross-encoder for cross-modal hashing

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this