A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

Zijian Hu; Kaifang Wan; Xiaoguang Gao; Yiwei Zhai

doi:10.1155/2019/7619483

A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

Zijian Hu, Kaifang Wan, Xiaoguang Gao, Yiwei Zhai

School of Electronics and Information

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

28 Scopus citations

Abstract

In deep reinforcement learning, network convergence speed is often slow and easily converges to local optimal solutions. For an environment with reward saltation, we propose a magnify saltatory reward (MSR) algorithm with variable parameters from the perspective of sample usage. MSR dynamically adjusts the rewards for experience with reward saltation in the experience pool, thereby increasing an agent's utilization of these experiences. We conducted experiments in a simulated obstacle avoidance search environment of an unmanned aerial vehicle and compared the experimental results of deep Q-network (DQN), double DQN, and dueling DQN after adding MSR. The experimental results demonstrate that, after adding MSR, the algorithms exhibit a faster network convergence and can obtain the global optimal solution easily.

Original language	English
Article number	7619483
Journal	Mathematical Problems in Engineering
Volume	2019
DOIs	https://doi.org/10.1155/2019/7619483
State	Published - 2019

Access to Document

10.1155/2019/7619483

Cite this

@article{966170abb70746be89cb0f1171fb0ea7,

title = "A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters",

abstract = "In deep reinforcement learning, network convergence speed is often slow and easily converges to local optimal solutions. For an environment with reward saltation, we propose a magnify saltatory reward (MSR) algorithm with variable parameters from the perspective of sample usage. MSR dynamically adjusts the rewards for experience with reward saltation in the experience pool, thereby increasing an agent's utilization of these experiences. We conducted experiments in a simulated obstacle avoidance search environment of an unmanned aerial vehicle and compared the experimental results of deep Q-network (DQN), double DQN, and dueling DQN after adding MSR. The experimental results demonstrate that, after adding MSR, the algorithms exhibit a faster network convergence and can obtain the global optimal solution easily.",

author = "Zijian Hu and Kaifang Wan and Xiaoguang Gao and Yiwei Zhai",

note = "Publisher Copyright: {\textcopyright} 2019 Zijian Hu et al.",

year = "2019",

doi = "10.1155/2019/7619483",

language = "英语",

volume = "2019",

journal = "Mathematical Problems in Engineering",

issn = "1024-123X",

publisher = "John Wiley and Sons Ltd",

}

TY - JOUR

T1 - A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

AU - Hu, Zijian

AU - Wan, Kaifang

AU - Gao, Xiaoguang

AU - Zhai, Yiwei

PY - 2019

Y1 - 2019

N2 - In deep reinforcement learning, network convergence speed is often slow and easily converges to local optimal solutions. For an environment with reward saltation, we propose a magnify saltatory reward (MSR) algorithm with variable parameters from the perspective of sample usage. MSR dynamically adjusts the rewards for experience with reward saltation in the experience pool, thereby increasing an agent's utilization of these experiences. We conducted experiments in a simulated obstacle avoidance search environment of an unmanned aerial vehicle and compared the experimental results of deep Q-network (DQN), double DQN, and dueling DQN after adding MSR. The experimental results demonstrate that, after adding MSR, the algorithms exhibit a faster network convergence and can obtain the global optimal solution easily.

AB - In deep reinforcement learning, network convergence speed is often slow and easily converges to local optimal solutions. For an environment with reward saltation, we propose a magnify saltatory reward (MSR) algorithm with variable parameters from the perspective of sample usage. MSR dynamically adjusts the rewards for experience with reward saltation in the experience pool, thereby increasing an agent's utilization of these experiences. We conducted experiments in a simulated obstacle avoidance search environment of an unmanned aerial vehicle and compared the experimental results of deep Q-network (DQN), double DQN, and dueling DQN after adding MSR. The experimental results demonstrate that, after adding MSR, the algorithms exhibit a faster network convergence and can obtain the global optimal solution easily.

UR - http://www.scopus.com/inward/record.url?scp=85076400699&partnerID=8YFLogxK

U2 - 10.1155/2019/7619483

DO - 10.1155/2019/7619483

M3 - 文章

AN - SCOPUS:85076400699

SN - 1024-123X

VL - 2019

JO - Mathematical Problems in Engineering

JF - Mathematical Problems in Engineering

M1 - 7619483

ER -

A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

Abstract

Access to Document

Other files and links

Fingerprint

Cite this