TY - JOUR
T1 - Attribute-Guided Multiple Instance Hashing Network for Cross-Modal Zero-Shot Hashing
AU - Song, Lingyun
AU - Shang, Xuequn
AU - Yang, Chen
AU - Sun, Mingxuan
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2023
Y1 - 2023
N2 - Cross-Modal Zero-Shot Hashing (CMZSH) is an important image retrieval technique, e.g., Text Based Image Retrieval. Most of existing CMZSH methods mainly use semantic attributes as guidance to generate hash codes for both the images and texts of seen and unseen categories. However, existing CMZSH methods only focus on learning global attribute vectors and hash codes for images, which mixes up information of complex semantics and background clutters, and thus impedes the retrieval performance. To solve this issue, we propose an Attribute-Guided Multiple Instance Hashing (AG-MIH) network for CMZSH, where each instance represents one image region. Instead of generating global image hash codes, AG-MIH can effectively learn instance-level hash codes based on instance attributes. To improve the attribute learning for instances, AG-MIH can exploia novel 2-D Category-Attribute Relation (CAR) layer, which uses different matching templates to model the relationships between each instance and the attributes for different categories. Under the guidance of semantic attributes, AG-MIH can effectively learn hash codes for each visual instance and texts by a Multi-stream Instance Hashing Refinement (MIHR) procedure. In the MIHR, the pseudo supervisions for the instance-level attributes and hash codes in each stream are from its proceeding stream. Empirical studies on benchmark datasets show that AG-MIH achieves state-of-the-art performance on both cross-modal and single-modal zero-shot image retrieval tasks.
AB - Cross-Modal Zero-Shot Hashing (CMZSH) is an important image retrieval technique, e.g., Text Based Image Retrieval. Most of existing CMZSH methods mainly use semantic attributes as guidance to generate hash codes for both the images and texts of seen and unseen categories. However, existing CMZSH methods only focus on learning global attribute vectors and hash codes for images, which mixes up information of complex semantics and background clutters, and thus impedes the retrieval performance. To solve this issue, we propose an Attribute-Guided Multiple Instance Hashing (AG-MIH) network for CMZSH, where each instance represents one image region. Instead of generating global image hash codes, AG-MIH can effectively learn instance-level hash codes based on instance attributes. To improve the attribute learning for instances, AG-MIH can exploia novel 2-D Category-Attribute Relation (CAR) layer, which uses different matching templates to model the relationships between each instance and the attributes for different categories. Under the guidance of semantic attributes, AG-MIH can effectively learn hash codes for each visual instance and texts by a Multi-stream Instance Hashing Refinement (MIHR) procedure. In the MIHR, the pseudo supervisions for the instance-level attributes and hash codes in each stream are from its proceeding stream. Empirical studies on benchmark datasets show that AG-MIH achieves state-of-the-art performance on both cross-modal and single-modal zero-shot image retrieval tasks.
KW - Attribute
KW - cross-modal hashing (CMH)
KW - deep learning
KW - information retrieval
KW - zero-shot learning (ZSL)
UR - http://www.scopus.com/inward/record.url?scp=85134258588&partnerID=8YFLogxK
U2 - 10.1109/TMM.2022.3190222
DO - 10.1109/TMM.2022.3190222
M3 - 文章
AN - SCOPUS:85134258588
SN - 1520-9210
VL - 25
SP - 5305
EP - 5318
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -