TY - JOUR
T1 - Narrowing the variance of variational cross-encoder for cross-modal hashing
AU - Tian, Dayong
AU - Cao, Yiqin
AU - Wei, Yiwen
AU - Zhou, Deyun
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2023/12
Y1 - 2023/12
N2 - Cross-modal hashing which embeds data to binary codes is an efficient tool for retrieving heterogeneous but correlated multimedia data. In real applications, the sizes of queries are much larger than that of the training set and the queries may be dissimilar to training data, which lays bare the shortage of generalization of deterministic models, such as cross-encoder and autoencoder. In this paper, we design a variational cross-encoder (VCE), a generative model, to tackle this problem. At the bottleneck layer, the VCE outputs distributions parameterized by means and variances. As VCE can generate diversified data using noises, the proposed model can perform better in testing data. Ideally, each distribution is expected to describe a category of data and samples of this distribution are expected to generate data in the same category. Under this expectation, the means and variances can be used as real codes for input data. However, the generated data generally are not belonging to the same category as the input data. Hence, we add a penalty term on variance output of VCE and use means as real codes for further generating hashing codes. Experiments on three widely used datasets validate the effectiveness of our method.
AB - Cross-modal hashing which embeds data to binary codes is an efficient tool for retrieving heterogeneous but correlated multimedia data. In real applications, the sizes of queries are much larger than that of the training set and the queries may be dissimilar to training data, which lays bare the shortage of generalization of deterministic models, such as cross-encoder and autoencoder. In this paper, we design a variational cross-encoder (VCE), a generative model, to tackle this problem. At the bottleneck layer, the VCE outputs distributions parameterized by means and variances. As VCE can generate diversified data using noises, the proposed model can perform better in testing data. Ideally, each distribution is expected to describe a category of data and samples of this distribution are expected to generate data in the same category. Under this expectation, the means and variances can be used as real codes for input data. However, the generated data generally are not belonging to the same category as the input data. Hence, we add a penalty term on variance output of VCE and use means as real codes for further generating hashing codes. Experiments on three widely used datasets validate the effectiveness of our method.
KW - Approximate nearest neighbor search
KW - Cross-modal
KW - Hashing
KW - Variational cross-encoder
UR - http://www.scopus.com/inward/record.url?scp=85169167718&partnerID=8YFLogxK
U2 - 10.1007/s00530-023-01161-3
DO - 10.1007/s00530-023-01161-3
M3 - 文章
AN - SCOPUS:85169167718
SN - 0942-4962
VL - 29
SP - 3421
EP - 3430
JO - Multimedia Systems
JF - Multimedia Systems
IS - 6
ER -