Attribute-Guided Multiple Instance Hashing Network for Cross-Modal Zero-Shot Hashing

Lingyun Song; Xuequn Shang; Chen Yang; Mingxuan Sun

doi:10.1109/TMM.2022.3190222

Attribute-Guided Multiple Instance Hashing Network for Cross-Modal Zero-Shot Hashing

Lingyun Song, Xuequn Shang, Chen Yang, Mingxuan Sun

School of Computer Science

Research output: Contribution to journal › Article › peer-review

10 Scopus citations

Abstract

Cross-Modal Zero-Shot Hashing (CMZSH) is an important image retrieval technique, e.g., Text Based Image Retrieval. Most of existing CMZSH methods mainly use semantic attributes as guidance to generate hash codes for both the images and texts of seen and unseen categories. However, existing CMZSH methods only focus on learning global attribute vectors and hash codes for images, which mixes up information of complex semantics and background clutters, and thus impedes the retrieval performance. To solve this issue, we propose an Attribute-Guided Multiple Instance Hashing (AG-MIH) network for CMZSH, where each instance represents one image region. Instead of generating global image hash codes, AG-MIH can effectively learn instance-level hash codes based on instance attributes. To improve the attribute learning for instances, AG-MIH can exploia novel 2-D Category-Attribute Relation (CAR) layer, which uses different matching templates to model the relationships between each instance and the attributes for different categories. Under the guidance of semantic attributes, AG-MIH can effectively learn hash codes for each visual instance and texts by a Multi-stream Instance Hashing Refinement (MIHR) procedure. In the MIHR, the pseudo supervisions for the instance-level attributes and hash codes in each stream are from its proceeding stream. Empirical studies on benchmark datasets show that AG-MIH achieves state-of-the-art performance on both cross-modal and single-modal zero-shot image retrieval tasks.

Original language	English
Pages (from-to)	5305-5318
Number of pages	14
Journal	IEEE Transactions on Multimedia
Volume	25
DOIs	https://doi.org/10.1109/TMM.2022.3190222
State	Published - 2023

Keywords

Attribute
cross-modal hashing (CMH)
deep learning
information retrieval
zero-shot learning (ZSL)

Access to Document

10.1109/TMM.2022.3190222

Cite this

@article{7c5d4432baaa47c882f8a952a293e3f6,

title = "Attribute-Guided Multiple Instance Hashing Network for Cross-Modal Zero-Shot Hashing",

abstract = "Cross-Modal Zero-Shot Hashing (CMZSH) is an important image retrieval technique, e.g., Text Based Image Retrieval. Most of existing CMZSH methods mainly use semantic attributes as guidance to generate hash codes for both the images and texts of seen and unseen categories. However, existing CMZSH methods only focus on learning global attribute vectors and hash codes for images, which mixes up information of complex semantics and background clutters, and thus impedes the retrieval performance. To solve this issue, we propose an Attribute-Guided Multiple Instance Hashing (AG-MIH) network for CMZSH, where each instance represents one image region. Instead of generating global image hash codes, AG-MIH can effectively learn instance-level hash codes based on instance attributes. To improve the attribute learning for instances, AG-MIH can exploia novel 2-D Category-Attribute Relation (CAR) layer, which uses different matching templates to model the relationships between each instance and the attributes for different categories. Under the guidance of semantic attributes, AG-MIH can effectively learn hash codes for each visual instance and texts by a Multi-stream Instance Hashing Refinement (MIHR) procedure. In the MIHR, the pseudo supervisions for the instance-level attributes and hash codes in each stream are from its proceeding stream. Empirical studies on benchmark datasets show that AG-MIH achieves state-of-the-art performance on both cross-modal and single-modal zero-shot image retrieval tasks.",

keywords = "Attribute, cross-modal hashing (CMH), deep learning, information retrieval, zero-shot learning (ZSL)",

author = "Lingyun Song and Xuequn Shang and Chen Yang and Mingxuan Sun",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.",

year = "2023",

doi = "10.1109/TMM.2022.3190222",

language = "英语",

volume = "25",

pages = "5305--5318",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Attribute-Guided Multiple Instance Hashing Network for Cross-Modal Zero-Shot Hashing

AU - Song, Lingyun

AU - Shang, Xuequn

AU - Yang, Chen

AU - Sun, Mingxuan

PY - 2023

Y1 - 2023

N2 - Cross-Modal Zero-Shot Hashing (CMZSH) is an important image retrieval technique, e.g., Text Based Image Retrieval. Most of existing CMZSH methods mainly use semantic attributes as guidance to generate hash codes for both the images and texts of seen and unseen categories. However, existing CMZSH methods only focus on learning global attribute vectors and hash codes for images, which mixes up information of complex semantics and background clutters, and thus impedes the retrieval performance. To solve this issue, we propose an Attribute-Guided Multiple Instance Hashing (AG-MIH) network for CMZSH, where each instance represents one image region. Instead of generating global image hash codes, AG-MIH can effectively learn instance-level hash codes based on instance attributes. To improve the attribute learning for instances, AG-MIH can exploia novel 2-D Category-Attribute Relation (CAR) layer, which uses different matching templates to model the relationships between each instance and the attributes for different categories. Under the guidance of semantic attributes, AG-MIH can effectively learn hash codes for each visual instance and texts by a Multi-stream Instance Hashing Refinement (MIHR) procedure. In the MIHR, the pseudo supervisions for the instance-level attributes and hash codes in each stream are from its proceeding stream. Empirical studies on benchmark datasets show that AG-MIH achieves state-of-the-art performance on both cross-modal and single-modal zero-shot image retrieval tasks.

AB - Cross-Modal Zero-Shot Hashing (CMZSH) is an important image retrieval technique, e.g., Text Based Image Retrieval. Most of existing CMZSH methods mainly use semantic attributes as guidance to generate hash codes for both the images and texts of seen and unseen categories. However, existing CMZSH methods only focus on learning global attribute vectors and hash codes for images, which mixes up information of complex semantics and background clutters, and thus impedes the retrieval performance. To solve this issue, we propose an Attribute-Guided Multiple Instance Hashing (AG-MIH) network for CMZSH, where each instance represents one image region. Instead of generating global image hash codes, AG-MIH can effectively learn instance-level hash codes based on instance attributes. To improve the attribute learning for instances, AG-MIH can exploia novel 2-D Category-Attribute Relation (CAR) layer, which uses different matching templates to model the relationships between each instance and the attributes for different categories. Under the guidance of semantic attributes, AG-MIH can effectively learn hash codes for each visual instance and texts by a Multi-stream Instance Hashing Refinement (MIHR) procedure. In the MIHR, the pseudo supervisions for the instance-level attributes and hash codes in each stream are from its proceeding stream. Empirical studies on benchmark datasets show that AG-MIH achieves state-of-the-art performance on both cross-modal and single-modal zero-shot image retrieval tasks.

KW - Attribute

KW - cross-modal hashing (CMH)

KW - deep learning

KW - information retrieval

KW - zero-shot learning (ZSL)

UR - http://www.scopus.com/inward/record.url?scp=85134258588&partnerID=8YFLogxK

U2 - 10.1109/TMM.2022.3190222

DO - 10.1109/TMM.2022.3190222

M3 - 文章

AN - SCOPUS:85134258588

SN - 1520-9210

VL - 25

SP - 5305

EP - 5318

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

ER -

Attribute-Guided Multiple Instance Hashing Network for Cross-Modal Zero-Shot Hashing

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this