Playing games with reinforcement learning via perceiving orientation and exploring diversity

Dong Zhang, Le Yang, Haobin Shi, Fangqing Mou, Mengkai Hu

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

The reinforcement learning can guide the agents to perform optimally under various complex environments. Although reinforcement learning has brought breakthrough for many domains, they are constrained by two bottlenecks: extremely delayed reward signal and the trade-off between diversity and speed. In this paper, we propose a novel framework to alleviate those two bottlenecks. For the delayed reward, we introduce a new term, named the orientation perception term, to calculate the award for each state. For a series of actions successfully leading to the target state, this term takes a difference to each state and assigns award to all states on the pathway, rather than only offers award to the target state. This mechanism allows the learning algorithm to percept the orientation information by distinguishing different states. For the trade-off between diversity and speed, we integrate the curriculum learning into the exploration process and propose the diversity exploration scheme. In the beginning, this scheme is prone to exploring the unexecuted action so as to discover the optimal action series. With the learning process carrying on, the scheme gradually relays more on the acquired knowledge and reduces the random probability. Such randomicity to certainty diversity exploration scheme guides the learning scheme to achieve proper balance between strategy diversity and convergency speed. We name the complete framework OpDe Reinforcement Learning and prove the algorithm convergence. Experiments on a standard platform demonstrate the effectiveness of the complete framework.

源语言英语
主期刊名Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017
出版商Institute of Electrical and Electronics Engineers Inc.
30-34
页数5
ISBN(电子版)9781538619773
DOI
出版状态已出版 - 2017
活动5th International Conference on Progress in Informatics and Computing, PIC 2017 - Nanjing, 中国
期限: 15 12月 201717 12月 2017

出版系列

姓名Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017

会议

会议5th International Conference on Progress in Informatics and Computing, PIC 2017
国家/地区中国
Nanjing
时期15/12/1717/12/17

指纹

探究 'Playing games with reinforcement learning via perceiving orientation and exploring diversity' 的科研主题。它们共同构成独一无二的指纹。

引用此