One for all: One-stage referring expression comprehension with dynamic reasoning

Zhipeng Zhang; Zhimin Wei; Zhongzhen Huang; Rui Niu; Peng Wang

doi:10.1016/j.neucom.2022.10.022

One for all: One-stage referring expression comprehension with dynamic reasoning

Zhipeng Zhang, Zhimin Wei, Zhongzhen Huang, Rui Niu, Peng Wang

School of Computer Science

Research output: Contribution to journal › Article › peer-review

9 Scopus citations

Abstract

Referring Expression Comprehension (REC) is one of the most important tasks in visual reasoning that requires a model to detect the target object referred by a natural language expression. Among the proposed pipelines, the one-stage Referring Expression Comprehension (OSREC) has become the dominant trend since it merges the region proposal and selection stages. Many state-of-the-art OSREC models adopt a multi-hop reasoning strategy because a sequence of objects is frequently mentioned in a single expression which needs multi-hop reasoning to analyze the semantic relation. However, one unsolved issue of these models is that the number of reasoning steps needs to be pre-defined and fixed before inference, ignoring the varying complexity of expressions. In this paper, we propose a Dynamic Multi-step Reasoning Network, which allows the reasoning steps to be dynamically adjusted based on the reasoning state and expression complexity. Specifically, we adopt a Transformer module to memorize & process the reasoning state and a Reinforcement Learning strategy to dynamically infer the reasoning steps. The work achieves the state-of-the-art performance or significant improvements on several REC datasets, ranging from RefCOCO (+, g) with short expressions, to Ref-Reasoning, a dataset with long and complex compositional expressions.

Original language	English
Pages (from-to)	523-532
Number of pages	10
Journal	Neurocomputing
Volume	518
DOIs	https://doi.org/10.1016/j.neucom.2022.10.022
State	Published - 21 Jan 2023

Keywords

Dynamic reasoning
Referring expression comprehension
Reinforcement learning

Access to Document

10.1016/j.neucom.2022.10.022

Cite this

@article{e58ea2a05fd4404596fbb5d317170d4a,

title = "One for all: One-stage referring expression comprehension with dynamic reasoning",

abstract = "Referring Expression Comprehension (REC) is one of the most important tasks in visual reasoning that requires a model to detect the target object referred by a natural language expression. Among the proposed pipelines, the one-stage Referring Expression Comprehension (OSREC) has become the dominant trend since it merges the region proposal and selection stages. Many state-of-the-art OSREC models adopt a multi-hop reasoning strategy because a sequence of objects is frequently mentioned in a single expression which needs multi-hop reasoning to analyze the semantic relation. However, one unsolved issue of these models is that the number of reasoning steps needs to be pre-defined and fixed before inference, ignoring the varying complexity of expressions. In this paper, we propose a Dynamic Multi-step Reasoning Network, which allows the reasoning steps to be dynamically adjusted based on the reasoning state and expression complexity. Specifically, we adopt a Transformer module to memorize & process the reasoning state and a Reinforcement Learning strategy to dynamically infer the reasoning steps. The work achieves the state-of-the-art performance or significant improvements on several REC datasets, ranging from RefCOCO (+, g) with short expressions, to Ref-Reasoning, a dataset with long and complex compositional expressions.",

keywords = "Dynamic reasoning, Referring expression comprehension, Reinforcement learning",

author = "Zhipeng Zhang and Zhimin Wei and Zhongzhen Huang and Rui Niu and Peng Wang",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier B.V.",

year = "2023",

month = jan,

day = "21",

doi = "10.1016/j.neucom.2022.10.022",

language = "英语",

volume = "518",

pages = "523--532",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - One for all

T2 - One-stage referring expression comprehension with dynamic reasoning

AU - Zhang, Zhipeng

AU - Wei, Zhimin

AU - Huang, Zhongzhen

AU - Niu, Rui

AU - Wang, Peng

PY - 2023/1/21

Y1 - 2023/1/21

N2 - Referring Expression Comprehension (REC) is one of the most important tasks in visual reasoning that requires a model to detect the target object referred by a natural language expression. Among the proposed pipelines, the one-stage Referring Expression Comprehension (OSREC) has become the dominant trend since it merges the region proposal and selection stages. Many state-of-the-art OSREC models adopt a multi-hop reasoning strategy because a sequence of objects is frequently mentioned in a single expression which needs multi-hop reasoning to analyze the semantic relation. However, one unsolved issue of these models is that the number of reasoning steps needs to be pre-defined and fixed before inference, ignoring the varying complexity of expressions. In this paper, we propose a Dynamic Multi-step Reasoning Network, which allows the reasoning steps to be dynamically adjusted based on the reasoning state and expression complexity. Specifically, we adopt a Transformer module to memorize & process the reasoning state and a Reinforcement Learning strategy to dynamically infer the reasoning steps. The work achieves the state-of-the-art performance or significant improvements on several REC datasets, ranging from RefCOCO (+, g) with short expressions, to Ref-Reasoning, a dataset with long and complex compositional expressions.

AB - Referring Expression Comprehension (REC) is one of the most important tasks in visual reasoning that requires a model to detect the target object referred by a natural language expression. Among the proposed pipelines, the one-stage Referring Expression Comprehension (OSREC) has become the dominant trend since it merges the region proposal and selection stages. Many state-of-the-art OSREC models adopt a multi-hop reasoning strategy because a sequence of objects is frequently mentioned in a single expression which needs multi-hop reasoning to analyze the semantic relation. However, one unsolved issue of these models is that the number of reasoning steps needs to be pre-defined and fixed before inference, ignoring the varying complexity of expressions. In this paper, we propose a Dynamic Multi-step Reasoning Network, which allows the reasoning steps to be dynamically adjusted based on the reasoning state and expression complexity. Specifically, we adopt a Transformer module to memorize & process the reasoning state and a Reinforcement Learning strategy to dynamically infer the reasoning steps. The work achieves the state-of-the-art performance or significant improvements on several REC datasets, ranging from RefCOCO (+, g) with short expressions, to Ref-Reasoning, a dataset with long and complex compositional expressions.

KW - Dynamic reasoning

KW - Referring expression comprehension

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85142151193&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2022.10.022

DO - 10.1016/j.neucom.2022.10.022

M3 - 文章

AN - SCOPUS:85142151193

SN - 0925-2312

VL - 518

SP - 523

EP - 532

JO - Neurocomputing

JF - Neurocomputing

ER -

One for all: One-stage referring expression comprehension with dynamic reasoning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this