TY - GEN
T1 - Deep binary reconstruction for cross-modal hashing
AU - Li, Xuelong
AU - Hu, Di
AU - Nie, Feiping
N1 - Publisher Copyright:
© 2017 Copyright held by the owner/author(s).
PY - 2017/10/23
Y1 - 2017/10/23
N2 - With the increasing demand of massive multimodal data storage and organization, cross-modal retrieval based on hashing technique has drawn much attention nowadays. It takes the binary codes of one modality as the query to retrieve the relevant hashing codes of another modality. However, the existing binary constraint makes it difficult to find the optimal cross-modal hashing function. Most approaches choose to relax the constraint and perform thresholding strategy on the real-value representation instead of directly solving the original objective. In this paper, we first provide a concrete analysis about the effectiveness of multimodal networks in preserving the inter- and intra-modal consistency. Based on the analysis, we provide a so-called Deep Binary Reconstruction (DBRC) network that can directly learn the binary hashing codes in an unsupervised fashion. The superiority comes from a proposed simple but efficient activation function, named as Adaptive Tanh (ATanh). The ATanh function can adaptively learn the binary codes and be trained via back-propagation. Extensive experiments on three benchmark datasets demonstrate that DBRC outperforms several state-of-the-art methods in both image2text and text2image retrieval task.
AB - With the increasing demand of massive multimodal data storage and organization, cross-modal retrieval based on hashing technique has drawn much attention nowadays. It takes the binary codes of one modality as the query to retrieve the relevant hashing codes of another modality. However, the existing binary constraint makes it difficult to find the optimal cross-modal hashing function. Most approaches choose to relax the constraint and perform thresholding strategy on the real-value representation instead of directly solving the original objective. In this paper, we first provide a concrete analysis about the effectiveness of multimodal networks in preserving the inter- and intra-modal consistency. Based on the analysis, we provide a so-called Deep Binary Reconstruction (DBRC) network that can directly learn the binary hashing codes in an unsupervised fashion. The superiority comes from a proposed simple but efficient activation function, named as Adaptive Tanh (ATanh). The ATanh function can adaptively learn the binary codes and be trained via back-propagation. Extensive experiments on three benchmark datasets demonstrate that DBRC outperforms several state-of-the-art methods in both image2text and text2image retrieval task.
KW - Binary reconstruction
KW - Cross-modal hashing
KW - Retrieval
UR - http://www.scopus.com/inward/record.url?scp=85035228217&partnerID=8YFLogxK
U2 - 10.1145/3123266.3123355
DO - 10.1145/3123266.3123355
M3 - 会议稿件
AN - SCOPUS:85035228217
T3 - MM 2017 - Proceedings of the 2017 ACM Multimedia Conference
SP - 1398
EP - 1406
BT - MM 2017 - Proceedings of the 2017 ACM Multimedia Conference
PB - Association for Computing Machinery, Inc
T2 - 25th ACM International Conference on Multimedia, MM 2017
Y2 - 23 October 2017 through 27 October 2017
ER -