HOP: History-and-Order Aware Pretraining for Vision-and-Language Navigation

Yanyuan Qiao, Yuankai Qi, Yicong Hong, Zheng Yu, Peng Wang, Qi Wu

科研成果: 书/报告/会议事项章节会议稿件同行评审

60 引用 (Scopus)

摘要

Pretraining has been adopted in a few of recent works for Vision-and-Language Navigation (VLN). However, pre-vious pre-training methods for VLN either lack the ability to predict future actions or ignore the trajectory contexts, which are essential for a greedy navigation process. In this work, to promote the learning of spatio-temporal visual-textual correspondence as well as the agent's capability of decision making, we propose a novel history-and-order aware pre-training paradigm (HOP) with VLN-specific objectives that exploit the past observations and support future action prediction. Specifically, in addition to the commonly used Masked Language Modeling (MLM) and Trajectory-Instruction Matching (TIM), we design two proxy tasks to model temporal order information: Trajectory Order Modeling (TOM) and Group Order Modeling (GOM). Moreover, our navigation action prediction is also enhanced by intro-ducing the task of Action Prediction with History (APH), which takes into account the history visual perceptions. Extensive experimental results on four downstream VLN tasks (R2R, REVERIE, NDH, RxR) demonstrate the effectiveness of our proposed method compared against several state-of-the-art agents.

源语言英语
主期刊名Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
出版商IEEE Computer Society
15397-15406
页数10
ISBN(电子版)9781665469463
DOI
出版状态已出版 - 2022
活动2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, 美国
期限: 19 6月 202224 6月 2022

出版系列

姓名Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
2022-June
ISSN(印刷版)1063-6919

会议

会议2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
国家/地区美国
New Orleans
时期19/06/2224/06/22

指纹

探究 'HOP: History-and-Order Aware Pretraining for Vision-and-Language Navigation' 的科研主题。它们共同构成独一无二的指纹。

引用此