Improving Fusion of Region Features and Grid Features via Two-Step Interaction for Image-Text Retrieval

Dongqing Wu, Huihui Li, Cang Gu, Lei Guo, Hang Liu

科研成果: 书/报告/会议事项章节会议稿件同行评审

6 引用 (Scopus)

摘要

In recent years, region features extracted from object detection networks have been widely used in the image-text retrieval task. However, they lack rich background and contextual information, which makes it difficult to match words describing global concepts in sentences. Meanwhile, the region features also lose the details of objects in the image. Fortunately, these disadvantages of region features are the advantages of grid features. In this paper, we propose a novel framework, which fuses the region features and grid features through a two-step interaction strategy, thus extracting a more comprehensive image representation for image-text retrieval. Concretely, in the first step, a joint graph with spatial information constraints is constructed, where all region features and grid features are represented as graph nodes. By modeling the relationships using the joint graph, the information can be passed edge-wise. In the second step, we propose a Cross-attention Gated Fusion module, which further explores the complex interactions between region features and grid features, and then adaptively fuses different types of features. With these two steps, our model can fully realize the complementary advantages of region features and grid features. In addition, we propose a Multi-Attention Pooling module to better aggregate the fused region features and grid features. Extensive experiments on two public datasets, including Flickr30K and MS-COCO, demonstrate that our model achieves the state-of-the-art and pushes the performance of image-text retrieval to a new height.

源语言英语
主期刊名MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
出版商Association for Computing Machinery, Inc
5055-5064
页数10
ISBN(电子版)9781450392037
DOI
出版状态已出版 - 10 10月 2022
活动30th ACM International Conference on Multimedia, MM 2022 - Lisboa, 葡萄牙
期限: 10 10月 202214 10月 2022

出版系列

姓名MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia

会议

会议30th ACM International Conference on Multimedia, MM 2022
国家/地区葡萄牙
Lisboa
时期10/10/2214/10/22

指纹

探究 'Improving Fusion of Region Features and Grid Features via Two-Step Interaction for Image-Text Retrieval' 的科研主题。它们共同构成独一无二的指纹。

引用此