A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

Zijian Hu; Kaifang Wan; Xiaoguang Gao; Yiwei Zhai

doi:10.1155/2019/7619483

A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

Zijian Hu, Kaifang Wan, Xiaoguang Gao, Yiwei Zhai

电子信息学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

28 引用（Scopus）

摘要

In deep reinforcement learning, network convergence speed is often slow and easily converges to local optimal solutions. For an environment with reward saltation, we propose a magnify saltatory reward (MSR) algorithm with variable parameters from the perspective of sample usage. MSR dynamically adjusts the rewards for experience with reward saltation in the experience pool, thereby increasing an agent's utilization of these experiences. We conducted experiments in a simulated obstacle avoidance search environment of an unmanned aerial vehicle and compared the experimental results of deep Q-network (DQN), double DQN, and dueling DQN after adding MSR. The experimental results demonstrate that, after adding MSR, the algorithms exhibit a faster network convergence and can obtain the global optimal solution easily.

源语言	英语
文章编号	7619483
期刊	Mathematical Problems in Engineering
卷	2019
DOI	https://doi.org/10.1155/2019/7619483
出版状态	已出版 - 2019

访问文件

10.1155/2019/7619483

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{966170abb70746be89cb0f1171fb0ea7,

title = "A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters",

abstract = "In deep reinforcement learning, network convergence speed is often slow and easily converges to local optimal solutions. For an environment with reward saltation, we propose a magnify saltatory reward (MSR) algorithm with variable parameters from the perspective of sample usage. MSR dynamically adjusts the rewards for experience with reward saltation in the experience pool, thereby increasing an agent's utilization of these experiences. We conducted experiments in a simulated obstacle avoidance search environment of an unmanned aerial vehicle and compared the experimental results of deep Q-network (DQN), double DQN, and dueling DQN after adding MSR. The experimental results demonstrate that, after adding MSR, the algorithms exhibit a faster network convergence and can obtain the global optimal solution easily.",

author = "Zijian Hu and Kaifang Wan and Xiaoguang Gao and Yiwei Zhai",

note = "Publisher Copyright: {\textcopyright} 2019 Zijian Hu et al.",

year = "2019",

doi = "10.1155/2019/7619483",

language = "英语",

volume = "2019",

journal = "Mathematical Problems in Engineering",

issn = "1024-123X",

publisher = "John Wiley and Sons Ltd",

}

TY - JOUR

T1 - A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

AU - Hu, Zijian

AU - Wan, Kaifang

AU - Gao, Xiaoguang

AU - Zhai, Yiwei

PY - 2019

Y1 - 2019

N2 - In deep reinforcement learning, network convergence speed is often slow and easily converges to local optimal solutions. For an environment with reward saltation, we propose a magnify saltatory reward (MSR) algorithm with variable parameters from the perspective of sample usage. MSR dynamically adjusts the rewards for experience with reward saltation in the experience pool, thereby increasing an agent's utilization of these experiences. We conducted experiments in a simulated obstacle avoidance search environment of an unmanned aerial vehicle and compared the experimental results of deep Q-network (DQN), double DQN, and dueling DQN after adding MSR. The experimental results demonstrate that, after adding MSR, the algorithms exhibit a faster network convergence and can obtain the global optimal solution easily.

AB - In deep reinforcement learning, network convergence speed is often slow and easily converges to local optimal solutions. For an environment with reward saltation, we propose a magnify saltatory reward (MSR) algorithm with variable parameters from the perspective of sample usage. MSR dynamically adjusts the rewards for experience with reward saltation in the experience pool, thereby increasing an agent's utilization of these experiences. We conducted experiments in a simulated obstacle avoidance search environment of an unmanned aerial vehicle and compared the experimental results of deep Q-network (DQN), double DQN, and dueling DQN after adding MSR. The experimental results demonstrate that, after adding MSR, the algorithms exhibit a faster network convergence and can obtain the global optimal solution easily.

UR - http://www.scopus.com/inward/record.url?scp=85076400699&partnerID=8YFLogxK

U2 - 10.1155/2019/7619483

DO - 10.1155/2019/7619483

M3 - 文章

AN - SCOPUS:85076400699

SN - 1024-123X

VL - 2019

JO - Mathematical Problems in Engineering

JF - Mathematical Problems in Engineering

M1 - 7619483

ER -

A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

摘要

访问文件

其它文件与链接

指纹

引用此