Boosting Cross-Modal Retrieval with MVSE++ and Reciprocal Neighbors

Wei Wei, Mengmeng Jiang, Xiangnan Zhang, Heng Liu, Chunna Tian

科研成果: 期刊稿件文章同行评审

9 引用 (Scopus)

摘要

In this paper, we propose to boost the cross-modal retrieval through mutually aligning images and captions on the aspects of both features and relationships. First, we propose a multi-feature based visual-semantic embedding (MVSE++) space to retrieve the candidates in another modality, which provides a more comprehensive representation of the visual content of objects and scene context in images. Thus, we have more potential to find a more accurate and detailed caption for the image. However, captioning concentrates the image contents by semantic description. The cross-modal neighboring relationships start from the visual and semantic sides are asymmetric. To retrieve a better cross-modal neighbor, we propose to re-rank the initially retrieved candidates according to the {k} nearest reciprocal neighbors in MVSE++ space. The method is evaluated on the benchmark datasets of MSCOCO and Flickr30K with standard metrics. We achieve highe accuracy in caption retrieval and image retrieval at both R@1 and R@10.

源语言英语
文章编号9085386
页(从-至)84642-84651
页数10
期刊IEEE Access
8
DOI
出版状态已出版 - 2020

指纹

探究 'Boosting Cross-Modal Retrieval with MVSE++ and Reciprocal Neighbors' 的科研主题。它们共同构成独一无二的指纹。

引用此