Deep binary reconstruction for cross-modal hashing

Xuelong Li; Di Hu; Feiping Nie

doi:10.1145/3123266.3123355

Deep binary reconstruction for cross-modal hashing

Xuelong Li, Di Hu, Feiping Nie

光电与智能研究院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

41 引用（Scopus）

摘要

With the increasing demand of massive multimodal data storage and organization, cross-modal retrieval based on hashing technique has drawn much attention nowadays. It takes the binary codes of one modality as the query to retrieve the relevant hashing codes of another modality. However, the existing binary constraint makes it difficult to find the optimal cross-modal hashing function. Most approaches choose to relax the constraint and perform thresholding strategy on the real-value representation instead of directly solving the original objective. In this paper, we first provide a concrete analysis about the effectiveness of multimodal networks in preserving the inter- and intra-modal consistency. Based on the analysis, we provide a so-called Deep Binary Reconstruction (DBRC) network that can directly learn the binary hashing codes in an unsupervised fashion. The superiority comes from a proposed simple but efficient activation function, named as Adaptive Tanh (ATanh). The ATanh function can adaptively learn the binary codes and be trained via back-propagation. Extensive experiments on three benchmark datasets demonstrate that DBRC outperforms several state-of-the-art methods in both image2text and text2image retrieval task.

源语言	英语
主期刊名	MM 2017 - Proceedings of the 2017 ACM Multimedia Conference
出版商	Association for Computing Machinery, Inc
页	1398-1406
页数	9
ISBN（电子版）	9781450349062
DOI	https://doi.org/10.1145/3123266.3123355
出版状态	已出版 - 23 10月 2017
活动	25th ACM International Conference on Multimedia, MM 2017 - Mountain View, 美国期限: 23 10月 2017 → 27 10月 2017

出版系列

姓名	MM 2017 - Proceedings of the 2017 ACM Multimedia Conference

会议

会议	25th ACM International Conference on Multimedia, MM 2017
国家/地区	美国
市	Mountain View
时期	23/10/17 → 27/10/17

访问文件

10.1145/3123266.3123355

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{d0ef9692cc574c898016d2ce23dc754a,

title = "Deep binary reconstruction for cross-modal hashing",

abstract = "With the increasing demand of massive multimodal data storage and organization, cross-modal retrieval based on hashing technique has drawn much attention nowadays. It takes the binary codes of one modality as the query to retrieve the relevant hashing codes of another modality. However, the existing binary constraint makes it difficult to find the optimal cross-modal hashing function. Most approaches choose to relax the constraint and perform thresholding strategy on the real-value representation instead of directly solving the original objective. In this paper, we first provide a concrete analysis about the effectiveness of multimodal networks in preserving the inter- and intra-modal consistency. Based on the analysis, we provide a so-called Deep Binary Reconstruction (DBRC) network that can directly learn the binary hashing codes in an unsupervised fashion. The superiority comes from a proposed simple but efficient activation function, named as Adaptive Tanh (ATanh). The ATanh function can adaptively learn the binary codes and be trained via back-propagation. Extensive experiments on three benchmark datasets demonstrate that DBRC outperforms several state-of-the-art methods in both image2text and text2image retrieval task.",

keywords = "Binary reconstruction, Cross-modal hashing, Retrieval",

author = "Xuelong Li and Di Hu and Feiping Nie",

note = "Publisher Copyright: {\textcopyright} 2017 Copyright held by the owner/author(s).; 25th ACM International Conference on Multimedia, MM 2017 ; Conference date: 23-10-2017 Through 27-10-2017",

year = "2017",

month = oct,

day = "23",

doi = "10.1145/3123266.3123355",

language = "英语",

series = "MM 2017 - Proceedings of the 2017 ACM Multimedia Conference",

publisher = "Association for Computing Machinery, Inc",

pages = "1398--1406",

booktitle = "MM 2017 - Proceedings of the 2017 ACM Multimedia Conference",

}

Li, X, Hu, D & Nie, F 2017, Deep binary reconstruction for cross-modal hashing. 在 MM 2017 - Proceedings of the 2017 ACM Multimedia Conference. MM 2017 - Proceedings of the 2017 ACM Multimedia Conference, Association for Computing Machinery, Inc, 页码 1398-1406, 25th ACM International Conference on Multimedia, MM 2017, Mountain View, 美国, 23/10/17. https://doi.org/10.1145/3123266.3123355

TY - GEN

T1 - Deep binary reconstruction for cross-modal hashing

AU - Li, Xuelong

AU - Hu, Di

AU - Nie, Feiping

PY - 2017/10/23

Y1 - 2017/10/23

N2 - With the increasing demand of massive multimodal data storage and organization, cross-modal retrieval based on hashing technique has drawn much attention nowadays. It takes the binary codes of one modality as the query to retrieve the relevant hashing codes of another modality. However, the existing binary constraint makes it difficult to find the optimal cross-modal hashing function. Most approaches choose to relax the constraint and perform thresholding strategy on the real-value representation instead of directly solving the original objective. In this paper, we first provide a concrete analysis about the effectiveness of multimodal networks in preserving the inter- and intra-modal consistency. Based on the analysis, we provide a so-called Deep Binary Reconstruction (DBRC) network that can directly learn the binary hashing codes in an unsupervised fashion. The superiority comes from a proposed simple but efficient activation function, named as Adaptive Tanh (ATanh). The ATanh function can adaptively learn the binary codes and be trained via back-propagation. Extensive experiments on three benchmark datasets demonstrate that DBRC outperforms several state-of-the-art methods in both image2text and text2image retrieval task.

AB - With the increasing demand of massive multimodal data storage and organization, cross-modal retrieval based on hashing technique has drawn much attention nowadays. It takes the binary codes of one modality as the query to retrieve the relevant hashing codes of another modality. However, the existing binary constraint makes it difficult to find the optimal cross-modal hashing function. Most approaches choose to relax the constraint and perform thresholding strategy on the real-value representation instead of directly solving the original objective. In this paper, we first provide a concrete analysis about the effectiveness of multimodal networks in preserving the inter- and intra-modal consistency. Based on the analysis, we provide a so-called Deep Binary Reconstruction (DBRC) network that can directly learn the binary hashing codes in an unsupervised fashion. The superiority comes from a proposed simple but efficient activation function, named as Adaptive Tanh (ATanh). The ATanh function can adaptively learn the binary codes and be trained via back-propagation. Extensive experiments on three benchmark datasets demonstrate that DBRC outperforms several state-of-the-art methods in both image2text and text2image retrieval task.

KW - Binary reconstruction

KW - Cross-modal hashing

KW - Retrieval

UR - http://www.scopus.com/inward/record.url?scp=85035228217&partnerID=8YFLogxK

U2 - 10.1145/3123266.3123355

DO - 10.1145/3123266.3123355

M3 - 会议稿件

AN - SCOPUS:85035228217

T3 - MM 2017 - Proceedings of the 2017 ACM Multimedia Conference

SP - 1398

EP - 1406

BT - MM 2017 - Proceedings of the 2017 ACM Multimedia Conference

PB - Association for Computing Machinery, Inc

T2 - 25th ACM International Conference on Multimedia, MM 2017

Y2 - 23 October 2017 through 27 October 2017

ER -

Deep binary reconstruction for cross-modal hashing

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此