Deep cross-modal retrieval for remote sensing image and audio

Guo Mao, Yuan Yuan, Lu Xiaoqiang

科研成果: 书/报告/会议事项章节会议稿件同行评审

56 引用 (Scopus)

摘要

Remote sensing image retrieval has many important applications in civilian and military fields, such as disaster monitoring and target detecting. However, the existing research on image retrieval, mainly including to two directions, text based and content based, cannot meet the rapid and convenient needs of some special applications and emergency scenes. Based on text, the retrieval is limited by keyboard inputting because of its lower efficiency for some urgent situations and based on content, it needs an example image as reference, which usually does not exist. Yet speech, as a direct, natural and efficient human-machine interactive way, can make up these shortcomings. Hence, a novel cross-modal retrieval method for remote sensing image and spoken audio is proposed in this paper. We first build a large-scale remote sensing image dataset with plenty of manual annotated spoken audio captions for the cross-modal retrieval task. Then a Deep Visual-Audio Network is designed to directly learn the correspondence of image and audio. And this model integrates feature extracting and multi-modal learning into the same network. Experiments on the proposed dataset verify the effectiveness of our approach and prove that it is feasible for speech-to-image retrieval.

源语言英语
主期刊名2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018
出版商Institute of Electrical and Electronics Engineers Inc.
ISBN(电子版)9781538684795
DOI
出版状态已出版 - 8 10月 2018
已对外发布
活动10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018 - Beijing, 中国
期限: 19 8月 201820 8月 2018

出版系列

姓名2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018

会议

会议10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018
国家/地区中国
Beijing
时期19/08/1820/08/18

指纹

探究 'Deep cross-modal retrieval for remote sensing image and audio' 的科研主题。它们共同构成独一无二的指纹。

引用此