TY - JOUR
T1 - Localizing state space for visual reinforcement learning in noisy environments
AU - Cheng, Jing
AU - Li, Jingchen
AU - Shi, Haobin
AU - Zhang, Tao
N1 - Publisher Copyright:
© 2025
PY - 2025/9/15
Y1 - 2025/9/15
N2 - Gaining robust policies is what the visual reinforcement learning community desires. In practical application, the noises in an environment lead to a larger variance in the perception of a reinforcement learning agent. This work introduces a non-differential module into deep reinforcement learning to localize the state space for agents, by which the impact of noises can be greatly reduced, and the learned policy can be explained implicitly. The proposed model leverages a hard attention module for localization, while an additional reinforcement learning process is built to update the localization module. We analyze the relationship between the non-differential module and agent, regarding the whole training as a hierarchical multi-agent reinforcement learning model, ensuring the convergence of policies by centralized evaluation. Moreover, to couple the localization policy and behavior policy, we modify the evaluation processes, gaining more direct coordination for them. The proposed method enables the agent to localize its observation or state in an explainable way, learning more advanced and robust policies by ignoring irrelevant data or changes in noisy environments. That is, it enhances reinforcement learning's ability to disturbance rejection. Several experiments on simulation environments and Robot Arm suggest our localization module can be embedded into existing reinforcement learning models to enhance them in many respects.
AB - Gaining robust policies is what the visual reinforcement learning community desires. In practical application, the noises in an environment lead to a larger variance in the perception of a reinforcement learning agent. This work introduces a non-differential module into deep reinforcement learning to localize the state space for agents, by which the impact of noises can be greatly reduced, and the learned policy can be explained implicitly. The proposed model leverages a hard attention module for localization, while an additional reinforcement learning process is built to update the localization module. We analyze the relationship between the non-differential module and agent, regarding the whole training as a hierarchical multi-agent reinforcement learning model, ensuring the convergence of policies by centralized evaluation. Moreover, to couple the localization policy and behavior policy, we modify the evaluation processes, gaining more direct coordination for them. The proposed method enables the agent to localize its observation or state in an explainable way, learning more advanced and robust policies by ignoring irrelevant data or changes in noisy environments. That is, it enhances reinforcement learning's ability to disturbance rejection. Several experiments on simulation environments and Robot Arm suggest our localization module can be embedded into existing reinforcement learning models to enhance them in many respects.
KW - Deep reinforcement learning
KW - Explainable reinforcement learning
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=105005276343&partnerID=8YFLogxK
U2 - 10.1016/j.engappai.2025.110998
DO - 10.1016/j.engappai.2025.110998
M3 - 文章
AN - SCOPUS:105005276343
SN - 0952-1976
VL - 156
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 110998
ER -