Giving Text More Imagination Space for Image-text Matching

Xinfeng Dong, Longfei Han, Dingwen Zhang, Li Liu, Junwei Han, Huaxiang Zhang

科研成果: 书/报告/会议事项章节会议稿件同行评审

4 引用 (Scopus)

摘要

Image-text matching is a hot topic in multi-modal analysis. The existing image-text matching algorithms focus on bridging the heterogeneity gap and mapping the feature into a common space under strong alignment assumption. However, these methods have unsatisfactory performance under the weak alignment scenario, which assumes that the text contains more abstract information, and the number of entities in the text is always fewer than objects in image. This is the first time, from our knowledge, to solve the image-text matching problem from the perspective of information difference with weak alignment. In order to both narrow the cross-modal heterogeneity gap and balance the information discrepancy, we proposed an imagination network to enrich the text modality based on pre-trained framework, which is helpful for image-text matching. The imagination network utilizes reinforcement learning to enhance the semantic information for text modality, and an action refinement strategy is designed to constrain the freedom and divergence of imagination. The experiment results show the superiority and generality of the proposed framework based on two pre-trained models, CLIP and BLIP on two most frequently-used datasets MSCOCO and Flickr30K.

源语言英语
主期刊名MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia
出版商Association for Computing Machinery, Inc
6359-6368
页数10
ISBN(电子版)9798400701085
DOI
出版状态已出版 - 26 10月 2023
活动31st ACM International Conference on Multimedia, MM 2023 - Ottawa, 加拿大
期限: 29 10月 20233 11月 2023

出版系列

姓名MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia

会议

会议31st ACM International Conference on Multimedia, MM 2023
国家/地区加拿大
Ottawa
时期29/10/233/11/23

指纹

探究 'Giving Text More Imagination Space for Image-text Matching' 的科研主题。它们共同构成独一无二的指纹。

引用此