Deep binary reconstruction for cross-modal hashing

Xuelong Li; Di Hu; Feiping Nie

doi:10.1145/3123266.3123355

Deep binary reconstruction for cross-modal hashing

Xuelong Li, Di Hu, Feiping Nie

School of Artificial Intelligence, OPtics and Electronics

Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

43 Scopus citations

Abstract

With the increasing demand of massive multimodal data storage and organization, cross-modal retrieval based on hashing technique has drawn much attention nowadays. It takes the binary codes of one modality as the query to retrieve the relevant hashing codes of another modality. However, the existing binary constraint makes it difficult to find the optimal cross-modal hashing function. Most approaches choose to relax the constraint and perform thresholding strategy on the real-value representation instead of directly solving the original objective. In this paper, we first provide a concrete analysis about the effectiveness of multimodal networks in preserving the inter- and intra-modal consistency. Based on the analysis, we provide a so-called Deep Binary Reconstruction (DBRC) network that can directly learn the binary hashing codes in an unsupervised fashion. The superiority comes from a proposed simple but efficient activation function, named as Adaptive Tanh (ATanh). The ATanh function can adaptively learn the binary codes and be trained via back-propagation. Extensive experiments on three benchmark datasets demonstrate that DBRC outperforms several state-of-the-art methods in both image2text and text2image retrieval task.

Original language	English
Title of host publication	MM 2017 - Proceedings of the 2017 ACM Multimedia Conference
Publisher	Association for Computing Machinery, Inc
Pages	1398-1406
Number of pages	9
ISBN (Electronic)	9781450349062
DOIs	https://doi.org/10.1145/3123266.3123355
State	Published - 23 Oct 2017
Event	25th ACM International Conference on Multimedia, MM 2017 - Mountain View, United States Duration: 23 Oct 2017 → 27 Oct 2017

Publication series

Name	MM 2017 - Proceedings of the 2017 ACM Multimedia Conference

Conference

Conference	25th ACM International Conference on Multimedia, MM 2017
Country/Territory	United States
City	Mountain View
Period	23/10/17 → 27/10/17

Keywords

Binary reconstruction
Cross-modal hashing
Retrieval

Access to Document

10.1145/3123266.3123355

Cite this

@inproceedings{d0ef9692cc574c898016d2ce23dc754a,

title = "Deep binary reconstruction for cross-modal hashing",

abstract = "With the increasing demand of massive multimodal data storage and organization, cross-modal retrieval based on hashing technique has drawn much attention nowadays. It takes the binary codes of one modality as the query to retrieve the relevant hashing codes of another modality. However, the existing binary constraint makes it difficult to find the optimal cross-modal hashing function. Most approaches choose to relax the constraint and perform thresholding strategy on the real-value representation instead of directly solving the original objective. In this paper, we first provide a concrete analysis about the effectiveness of multimodal networks in preserving the inter- and intra-modal consistency. Based on the analysis, we provide a so-called Deep Binary Reconstruction (DBRC) network that can directly learn the binary hashing codes in an unsupervised fashion. The superiority comes from a proposed simple but efficient activation function, named as Adaptive Tanh (ATanh). The ATanh function can adaptively learn the binary codes and be trained via back-propagation. Extensive experiments on three benchmark datasets demonstrate that DBRC outperforms several state-of-the-art methods in both image2text and text2image retrieval task.",

keywords = "Binary reconstruction, Cross-modal hashing, Retrieval",

author = "Xuelong Li and Di Hu and Feiping Nie",

note = "Publisher Copyright: {\textcopyright} 2017 Copyright held by the owner/author(s).; 25th ACM International Conference on Multimedia, MM 2017 ; Conference date: 23-10-2017 Through 27-10-2017",

year = "2017",

month = oct,

day = "23",

doi = "10.1145/3123266.3123355",

language = "英语",

series = "MM 2017 - Proceedings of the 2017 ACM Multimedia Conference",

publisher = "Association for Computing Machinery, Inc",

pages = "1398--1406",

booktitle = "MM 2017 - Proceedings of the 2017 ACM Multimedia Conference",

}

Li, X, Hu, D & Nie, F 2017, Deep binary reconstruction for cross-modal hashing. in MM 2017 - Proceedings of the 2017 ACM Multimedia Conference. MM 2017 - Proceedings of the 2017 ACM Multimedia Conference, Association for Computing Machinery, Inc, pp. 1398-1406, 25th ACM International Conference on Multimedia, MM 2017, Mountain View, United States, 23/10/17. https://doi.org/10.1145/3123266.3123355

Deep binary reconstruction for cross-modal hashing. / Li, Xuelong; Hu, Di; Nie, Feiping.
MM 2017 - Proceedings of the 2017 ACM Multimedia Conference. Association for Computing Machinery, Inc, 2017. p. 1398-1406 (MM 2017 - Proceedings of the 2017 ACM Multimedia Conference).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Deep binary reconstruction for cross-modal hashing

AU - Li, Xuelong

AU - Hu, Di

AU - Nie, Feiping

PY - 2017/10/23

Y1 - 2017/10/23

N2 - With the increasing demand of massive multimodal data storage and organization, cross-modal retrieval based on hashing technique has drawn much attention nowadays. It takes the binary codes of one modality as the query to retrieve the relevant hashing codes of another modality. However, the existing binary constraint makes it difficult to find the optimal cross-modal hashing function. Most approaches choose to relax the constraint and perform thresholding strategy on the real-value representation instead of directly solving the original objective. In this paper, we first provide a concrete analysis about the effectiveness of multimodal networks in preserving the inter- and intra-modal consistency. Based on the analysis, we provide a so-called Deep Binary Reconstruction (DBRC) network that can directly learn the binary hashing codes in an unsupervised fashion. The superiority comes from a proposed simple but efficient activation function, named as Adaptive Tanh (ATanh). The ATanh function can adaptively learn the binary codes and be trained via back-propagation. Extensive experiments on three benchmark datasets demonstrate that DBRC outperforms several state-of-the-art methods in both image2text and text2image retrieval task.

AB - With the increasing demand of massive multimodal data storage and organization, cross-modal retrieval based on hashing technique has drawn much attention nowadays. It takes the binary codes of one modality as the query to retrieve the relevant hashing codes of another modality. However, the existing binary constraint makes it difficult to find the optimal cross-modal hashing function. Most approaches choose to relax the constraint and perform thresholding strategy on the real-value representation instead of directly solving the original objective. In this paper, we first provide a concrete analysis about the effectiveness of multimodal networks in preserving the inter- and intra-modal consistency. Based on the analysis, we provide a so-called Deep Binary Reconstruction (DBRC) network that can directly learn the binary hashing codes in an unsupervised fashion. The superiority comes from a proposed simple but efficient activation function, named as Adaptive Tanh (ATanh). The ATanh function can adaptively learn the binary codes and be trained via back-propagation. Extensive experiments on three benchmark datasets demonstrate that DBRC outperforms several state-of-the-art methods in both image2text and text2image retrieval task.

KW - Binary reconstruction

KW - Cross-modal hashing

KW - Retrieval

UR - http://www.scopus.com/inward/record.url?scp=85035228217&partnerID=8YFLogxK

U2 - 10.1145/3123266.3123355

DO - 10.1145/3123266.3123355

M3 - 会议稿件

AN - SCOPUS:85035228217

T3 - MM 2017 - Proceedings of the 2017 ACM Multimedia Conference

SP - 1398

EP - 1406

BT - MM 2017 - Proceedings of the 2017 ACM Multimedia Conference

PB - Association for Computing Machinery, Inc

T2 - 25th ACM International Conference on Multimedia, MM 2017

Y2 - 23 October 2017 through 27 October 2017

ER -

Deep binary reconstruction for cross-modal hashing

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this