Playing games with reinforcement learning via perceiving orientation and exploring diversity

Dong Zhang, Le Yang, Haobin Shi, Fangqing Mou, Mengkai Hu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The reinforcement learning can guide the agents to perform optimally under various complex environments. Although reinforcement learning has brought breakthrough for many domains, they are constrained by two bottlenecks: extremely delayed reward signal and the trade-off between diversity and speed. In this paper, we propose a novel framework to alleviate those two bottlenecks. For the delayed reward, we introduce a new term, named the orientation perception term, to calculate the award for each state. For a series of actions successfully leading to the target state, this term takes a difference to each state and assigns award to all states on the pathway, rather than only offers award to the target state. This mechanism allows the learning algorithm to percept the orientation information by distinguishing different states. For the trade-off between diversity and speed, we integrate the curriculum learning into the exploration process and propose the diversity exploration scheme. In the beginning, this scheme is prone to exploring the unexecuted action so as to discover the optimal action series. With the learning process carrying on, the scheme gradually relays more on the acquired knowledge and reduces the random probability. Such randomicity to certainty diversity exploration scheme guides the learning scheme to achieve proper balance between strategy diversity and convergency speed. We name the complete framework OpDe Reinforcement Learning and prove the algorithm convergence. Experiments on a standard platform demonstrate the effectiveness of the complete framework.

Original languageEnglish
Title of host publicationProceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages30-34
Number of pages5
ISBN (Electronic)9781538619773
DOIs
StatePublished - 2017
Event5th International Conference on Progress in Informatics and Computing, PIC 2017 - Nanjing, China
Duration: 15 Dec 201717 Dec 2017

Publication series

NameProceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017

Conference

Conference5th International Conference on Progress in Informatics and Computing, PIC 2017
Country/TerritoryChina
CityNanjing
Period15/12/1717/12/17

Keywords

  • Curriculum learning
  • Delayed reward
  • Diversity exploration
  • Orientation perception
  • Reinforcement learning

Fingerprint

Dive into the research topics of 'Playing games with reinforcement learning via perceiving orientation and exploring diversity'. Together they form a unique fingerprint.

Cite this