Deep cross-modal retrieval for remote sensing image and audio

Guo Mao; Yuan Yuan; Lu Xiaoqiang

doi:10.1109/PRRS.2018.8486338

Deep cross-modal retrieval for remote sensing image and audio

Guo Mao, Yuan Yuan, Lu Xiaoqiang

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

56 引用（Scopus）

摘要

Remote sensing image retrieval has many important applications in civilian and military fields, such as disaster monitoring and target detecting. However, the existing research on image retrieval, mainly including to two directions, text based and content based, cannot meet the rapid and convenient needs of some special applications and emergency scenes. Based on text, the retrieval is limited by keyboard inputting because of its lower efficiency for some urgent situations and based on content, it needs an example image as reference, which usually does not exist. Yet speech, as a direct, natural and efficient human-machine interactive way, can make up these shortcomings. Hence, a novel cross-modal retrieval method for remote sensing image and spoken audio is proposed in this paper. We first build a large-scale remote sensing image dataset with plenty of manual annotated spoken audio captions for the cross-modal retrieval task. Then a Deep Visual-Audio Network is designed to directly learn the correspondence of image and audio. And this model integrates feature extracting and multi-modal learning into the same network. Experiments on the proposed dataset verify the effectiveness of our approach and prove that it is feasible for speech-to-image retrieval.

源语言	英语
主期刊名	2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018
出版商	Institute of Electrical and Electronics Engineers Inc.
ISBN（电子版）	9781538684795
DOI	https://doi.org/10.1109/PRRS.2018.8486338
出版状态	已出版 - 8 10月 2018
已对外发布	是
活动	10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018 - Beijing, 中国期限: 19 8月 2018 → 20 8月 2018

出版系列

姓名	2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018

会议

会议	10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018
国家/地区	中国
市	Beijing
时期	19/08/18 → 20/08/18

访问文件

10.1109/PRRS.2018.8486338

其它文件与链接

链接到 Scopus 的出版物

引用此

Mao, G., Yuan, Y., & Xiaoqiang, L. (2018). Deep cross-modal retrieval for remote sensing image and audio. 在 2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018 文章 8486338 (2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/PRRS.2018.8486338

@inproceedings{d234d7da1bc544d2afa2418b11466e3e,

title = "Deep cross-modal retrieval for remote sensing image and audio",

abstract = "Remote sensing image retrieval has many important applications in civilian and military fields, such as disaster monitoring and target detecting. However, the existing research on image retrieval, mainly including to two directions, text based and content based, cannot meet the rapid and convenient needs of some special applications and emergency scenes. Based on text, the retrieval is limited by keyboard inputting because of its lower efficiency for some urgent situations and based on content, it needs an example image as reference, which usually does not exist. Yet speech, as a direct, natural and efficient human-machine interactive way, can make up these shortcomings. Hence, a novel cross-modal retrieval method for remote sensing image and spoken audio is proposed in this paper. We first build a large-scale remote sensing image dataset with plenty of manual annotated spoken audio captions for the cross-modal retrieval task. Then a Deep Visual-Audio Network is designed to directly learn the correspondence of image and audio. And this model integrates feature extracting and multi-modal learning into the same network. Experiments on the proposed dataset verify the effectiveness of our approach and prove that it is feasible for speech-to-image retrieval.",

keywords = "Convolutional neural network, Cross-modal retrieval, Remote sensing image, Spoken audio",

author = "Guo Mao and Yuan Yuan and Lu Xiaoqiang",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018 ; Conference date: 19-08-2018 Through 20-08-2018",

year = "2018",

month = oct,

day = "8",

doi = "10.1109/PRRS.2018.8486338",

language = "英语",

series = "2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018",

}

Mao, G, Yuan, Y & Xiaoqiang, L 2018, Deep cross-modal retrieval for remote sensing image and audio. 在 2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018., 8486338, 2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018, Institute of Electrical and Electronics Engineers Inc., 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018, Beijing, 中国, 19/08/18. https://doi.org/10.1109/PRRS.2018.8486338

Deep cross-modal retrieval for remote sensing image and audio. / Mao, Guo; Yuan, Yuan; Xiaoqiang, Lu.
2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018. Institute of Electrical and Electronics Engineers Inc., 2018. 8486338 (2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Deep cross-modal retrieval for remote sensing image and audio

AU - Mao, Guo

AU - Yuan, Yuan

AU - Xiaoqiang, Lu

PY - 2018/10/8

Y1 - 2018/10/8

N2 - Remote sensing image retrieval has many important applications in civilian and military fields, such as disaster monitoring and target detecting. However, the existing research on image retrieval, mainly including to two directions, text based and content based, cannot meet the rapid and convenient needs of some special applications and emergency scenes. Based on text, the retrieval is limited by keyboard inputting because of its lower efficiency for some urgent situations and based on content, it needs an example image as reference, which usually does not exist. Yet speech, as a direct, natural and efficient human-machine interactive way, can make up these shortcomings. Hence, a novel cross-modal retrieval method for remote sensing image and spoken audio is proposed in this paper. We first build a large-scale remote sensing image dataset with plenty of manual annotated spoken audio captions for the cross-modal retrieval task. Then a Deep Visual-Audio Network is designed to directly learn the correspondence of image and audio. And this model integrates feature extracting and multi-modal learning into the same network. Experiments on the proposed dataset verify the effectiveness of our approach and prove that it is feasible for speech-to-image retrieval.

AB - Remote sensing image retrieval has many important applications in civilian and military fields, such as disaster monitoring and target detecting. However, the existing research on image retrieval, mainly including to two directions, text based and content based, cannot meet the rapid and convenient needs of some special applications and emergency scenes. Based on text, the retrieval is limited by keyboard inputting because of its lower efficiency for some urgent situations and based on content, it needs an example image as reference, which usually does not exist. Yet speech, as a direct, natural and efficient human-machine interactive way, can make up these shortcomings. Hence, a novel cross-modal retrieval method for remote sensing image and spoken audio is proposed in this paper. We first build a large-scale remote sensing image dataset with plenty of manual annotated spoken audio captions for the cross-modal retrieval task. Then a Deep Visual-Audio Network is designed to directly learn the correspondence of image and audio. And this model integrates feature extracting and multi-modal learning into the same network. Experiments on the proposed dataset verify the effectiveness of our approach and prove that it is feasible for speech-to-image retrieval.

KW - Convolutional neural network

KW - Cross-modal retrieval

KW - Remote sensing image

KW - Spoken audio

UR - http://www.scopus.com/inward/record.url?scp=85056498790&partnerID=8YFLogxK

U2 - 10.1109/PRRS.2018.8486338

DO - 10.1109/PRRS.2018.8486338

M3 - 会议稿件

AN - SCOPUS:85056498790

T3 - 2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018

BT - 2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018

Y2 - 19 August 2018 through 20 August 2018

ER -

Deep cross-modal retrieval for remote sensing image and audio

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此