视觉-语言导航的研究进展与发展趋势

Translated title of the contribution: Survey on the Research Progress and Development Trend of Vision-and-Language Navigation

Kai Niu, Peng Wang

Research output: Contribution to journalReview articlepeer-review

2 Scopus citations

Abstract

Vision-and-language navigation is a newly emerging research topic developing rapidly in recent years, and it is one of the representative research tasks in the frontier field of vision-language interaction. The goal of this task is to realize autonomous navigation based on visual perception of environment according to language instructions given by human. This paper reviews the recent progress in vision-and-language navigation. Firstly, the research content of this task is introduced, and the three main problems and challenges of cross-modal semantic alignments, semantic understanding and reasoning, and generalization ability enhancement are analyzed. Secondly, commonly-used datasets and evaluation metrics are listed. Thirdly, the research progress of this task is summarized from four aspects of imitation learning, reinforcement learning, self-supervised learning and other methods, and the effects of the typical solutions are carefully compared and analyzed. Fourthly, the current research trends of this task are discussed, which mainly include continuous environment navigation, advanced complex instruction comprehension and common sense reasoning. Finally, the future development directions such as 3D visual-and-language navigation, embodied question answering and interactive question answering are further discussed and prospected.

Translated title of the contributionSurvey on the Research Progress and Development Trend of Vision-and-Language Navigation
Original languageChinese (Traditional)
Pages (from-to)1815-1827
Number of pages13
JournalJisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics
Volume34
Issue number12
DOIs
StatePublished - 1 Dec 2022

Fingerprint

Dive into the research topics of 'Survey on the Research Progress and Development Trend of Vision-and-Language Navigation'. Together they form a unique fingerprint.

Cite this