Narrowing the variance of variational cross-encoder for cross-modal hashing

Dayong Tian, Yiqin Cao, Yiwen Wei, Deyun Zhou

Research output: Contribution to journalArticlepeer-review

Abstract

Cross-modal hashing which embeds data to binary codes is an efficient tool for retrieving heterogeneous but correlated multimedia data. In real applications, the sizes of queries are much larger than that of the training set and the queries may be dissimilar to training data, which lays bare the shortage of generalization of deterministic models, such as cross-encoder and autoencoder. In this paper, we design a variational cross-encoder (VCE), a generative model, to tackle this problem. At the bottleneck layer, the VCE outputs distributions parameterized by means and variances. As VCE can generate diversified data using noises, the proposed model can perform better in testing data. Ideally, each distribution is expected to describe a category of data and samples of this distribution are expected to generate data in the same category. Under this expectation, the means and variances can be used as real codes for input data. However, the generated data generally are not belonging to the same category as the input data. Hence, we add a penalty term on variance output of VCE and use means as real codes for further generating hashing codes. Experiments on three widely used datasets validate the effectiveness of our method.

Original languageEnglish
Pages (from-to)3421-3430
Number of pages10
JournalMultimedia Systems
Volume29
Issue number6
DOIs
StatePublished - Dec 2023

Keywords

  • Approximate nearest neighbor search
  • Cross-modal
  • Hashing
  • Variational cross-encoder

Fingerprint

Dive into the research topics of 'Narrowing the variance of variational cross-encoder for cross-modal hashing'. Together they form a unique fingerprint.

Cite this