视觉-语言导航的研究进展与发展趋势

Kai Niu; Peng Wang

doi:10.3724/SP.J.1089.2022.19249

视觉-语言导航的研究进展与发展趋势

Translated title of the contribution: Survey on the Research Progress and Development Trend of Vision-and-Language Navigation

Kai Niu, Peng Wang

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Review article › peer-review

2 Scopus citations

Abstract

Vision-and-language navigation is a newly emerging research topic developing rapidly in recent years, and it is one of the representative research tasks in the frontier field of vision-language interaction. The goal of this task is to realize autonomous navigation based on visual perception of environment according to language instructions given by human. This paper reviews the recent progress in vision-and-language navigation. Firstly, the research content of this task is introduced, and the three main problems and challenges of cross-modal semantic alignments, semantic understanding and reasoning, and generalization ability enhancement are analyzed. Secondly, commonly-used datasets and evaluation metrics are listed. Thirdly, the research progress of this task is summarized from four aspects of imitation learning, reinforcement learning, self-supervised learning and other methods, and the effects of the typical solutions are carefully compared and analyzed. Fourthly, the current research trends of this task are discussed, which mainly include continuous environment navigation, advanced complex instruction comprehension and common sense reasoning. Finally, the future development directions such as 3D visual-and-language navigation, embodied question answering and interactive question answering are further discussed and prospected.

Translated title of the contribution	Survey on the Research Progress and Development Trend of Vision-and-Language Navigation
Original language	Chinese (Traditional)
Pages (from-to)	1815-1827
Number of pages	13
Journal	Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics
Volume	34
Issue number	12
DOIs	https://doi.org/10.3724/SP.J.1089.2022.19249
State	Published - 1 Dec 2022

Access to Document

10.3724/SP.J.1089.2022.19249

Cite this

@article{804b9cef680846a78b048299b9a06085,

title = "视觉-语言导航的研究进展与发展趋势",

abstract = "Vision-and-language navigation is a newly emerging research topic developing rapidly in recent years, and it is one of the representative research tasks in the frontier field of vision-language interaction. The goal of this task is to realize autonomous navigation based on visual perception of environment according to language instructions given by human. This paper reviews the recent progress in vision-and-language navigation. Firstly, the research content of this task is introduced, and the three main problems and challenges of cross-modal semantic alignments, semantic understanding and reasoning, and generalization ability enhancement are analyzed. Secondly, commonly-used datasets and evaluation metrics are listed. Thirdly, the research progress of this task is summarized from four aspects of imitation learning, reinforcement learning, self-supervised learning and other methods, and the effects of the typical solutions are carefully compared and analyzed. Fourthly, the current research trends of this task are discussed, which mainly include continuous environment navigation, advanced complex instruction comprehension and common sense reasoning. Finally, the future development directions such as 3D visual-and-language navigation, embodied question answering and interactive question answering are further discussed and prospected.",

keywords = "action prediction, cross-modal semantic alignments, vision-and-language navigation, vision-language interaction",

author = "Kai Niu and Peng Wang",

year = "2022",

month = dec,

day = "1",

doi = "10.3724/SP.J.1089.2022.19249",

language = "繁体中文",

volume = "34",

pages = "1815--1827",

journal = "Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics",

issn = "1003-9775",

publisher = "Institute of Computing Technology",

number = "12",

}

TY - JOUR

T1 - 视觉-语言导航的研究进展与发展趋势

AU - Niu, Kai

AU - Wang, Peng

PY - 2022/12/1

Y1 - 2022/12/1

N2 - Vision-and-language navigation is a newly emerging research topic developing rapidly in recent years, and it is one of the representative research tasks in the frontier field of vision-language interaction. The goal of this task is to realize autonomous navigation based on visual perception of environment according to language instructions given by human. This paper reviews the recent progress in vision-and-language navigation. Firstly, the research content of this task is introduced, and the three main problems and challenges of cross-modal semantic alignments, semantic understanding and reasoning, and generalization ability enhancement are analyzed. Secondly, commonly-used datasets and evaluation metrics are listed. Thirdly, the research progress of this task is summarized from four aspects of imitation learning, reinforcement learning, self-supervised learning and other methods, and the effects of the typical solutions are carefully compared and analyzed. Fourthly, the current research trends of this task are discussed, which mainly include continuous environment navigation, advanced complex instruction comprehension and common sense reasoning. Finally, the future development directions such as 3D visual-and-language navigation, embodied question answering and interactive question answering are further discussed and prospected.

AB - Vision-and-language navigation is a newly emerging research topic developing rapidly in recent years, and it is one of the representative research tasks in the frontier field of vision-language interaction. The goal of this task is to realize autonomous navigation based on visual perception of environment according to language instructions given by human. This paper reviews the recent progress in vision-and-language navigation. Firstly, the research content of this task is introduced, and the three main problems and challenges of cross-modal semantic alignments, semantic understanding and reasoning, and generalization ability enhancement are analyzed. Secondly, commonly-used datasets and evaluation metrics are listed. Thirdly, the research progress of this task is summarized from four aspects of imitation learning, reinforcement learning, self-supervised learning and other methods, and the effects of the typical solutions are carefully compared and analyzed. Fourthly, the current research trends of this task are discussed, which mainly include continuous environment navigation, advanced complex instruction comprehension and common sense reasoning. Finally, the future development directions such as 3D visual-and-language navigation, embodied question answering and interactive question answering are further discussed and prospected.

KW - action prediction

KW - cross-modal semantic alignments

KW - vision-and-language navigation

KW - vision-language interaction

UR - http://www.scopus.com/inward/record.url?scp=85147545823&partnerID=8YFLogxK

U2 - 10.3724/SP.J.1089.2022.19249

DO - 10.3724/SP.J.1089.2022.19249

M3 - 文献综述

AN - SCOPUS:85147545823

SN - 1003-9775

VL - 34

SP - 1815

EP - 1827

JO - Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics

JF - Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics

IS - 12

ER -

视觉-语言导航的研究进展与发展趋势

Abstract

Access to Document

Other files and links

Fingerprint

Cite this