Localizing state space for visual reinforcement learning in noisy environments

Jing Cheng, Jingchen Li, Haobin Shi, Tao Zhang

Research output: Contribution to journalArticlepeer-review

Abstract

Gaining robust policies is what the visual reinforcement learning community desires. In practical application, the noises in an environment lead to a larger variance in the perception of a reinforcement learning agent. This work introduces a non-differential module into deep reinforcement learning to localize the state space for agents, by which the impact of noises can be greatly reduced, and the learned policy can be explained implicitly. The proposed model leverages a hard attention module for localization, while an additional reinforcement learning process is built to update the localization module. We analyze the relationship between the non-differential module and agent, regarding the whole training as a hierarchical multi-agent reinforcement learning model, ensuring the convergence of policies by centralized evaluation. Moreover, to couple the localization policy and behavior policy, we modify the evaluation processes, gaining more direct coordination for them. The proposed method enables the agent to localize its observation or state in an explainable way, learning more advanced and robust policies by ignoring irrelevant data or changes in noisy environments. That is, it enhances reinforcement learning's ability to disturbance rejection. Several experiments on simulation environments and Robot Arm suggest our localization module can be embedded into existing reinforcement learning models to enhance them in many respects.

Original languageEnglish
Article number110998
JournalEngineering Applications of Artificial Intelligence
Volume156
DOIs
StatePublished - 15 Sep 2025

Keywords

  • Deep reinforcement learning
  • Explainable reinforcement learning
  • Reinforcement learning

Fingerprint

Dive into the research topics of 'Localizing state space for visual reinforcement learning in noisy environments'. Together they form a unique fingerprint.

Cite this