Playing games with reinforcement learning via perceiving orientation and exploring diversity

Dong Zhang; Le Yang; Haobin Shi; Fangqing Mou; Mengkai Hu

doi:10.1109/PIC.2017.8359509

Playing games with reinforcement learning via perceiving orientation and exploring diversity

Dong Zhang, Le Yang, Haobin Shi, Fangqing Mou, Mengkai Hu

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

The reinforcement learning can guide the agents to perform optimally under various complex environments. Although reinforcement learning has brought breakthrough for many domains, they are constrained by two bottlenecks: extremely delayed reward signal and the trade-off between diversity and speed. In this paper, we propose a novel framework to alleviate those two bottlenecks. For the delayed reward, we introduce a new term, named the orientation perception term, to calculate the award for each state. For a series of actions successfully leading to the target state, this term takes a difference to each state and assigns award to all states on the pathway, rather than only offers award to the target state. This mechanism allows the learning algorithm to percept the orientation information by distinguishing different states. For the trade-off between diversity and speed, we integrate the curriculum learning into the exploration process and propose the diversity exploration scheme. In the beginning, this scheme is prone to exploring the unexecuted action so as to discover the optimal action series. With the learning process carrying on, the scheme gradually relays more on the acquired knowledge and reduces the random probability. Such randomicity to certainty diversity exploration scheme guides the learning scheme to achieve proper balance between strategy diversity and convergency speed. We name the complete framework OpDe Reinforcement Learning and prove the algorithm convergence. Experiments on a standard platform demonstrate the effectiveness of the complete framework.

Original language	English
Title of host publication	Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	30-34
Number of pages	5
ISBN (Electronic)	9781538619773
DOIs	https://doi.org/10.1109/PIC.2017.8359509
State	Published - 2017
Event	5th International Conference on Progress in Informatics and Computing, PIC 2017 - Nanjing, China Duration: 15 Dec 2017 → 17 Dec 2017

Publication series

Name	Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017

Conference

Conference	5th International Conference on Progress in Informatics and Computing, PIC 2017
Country/Territory	China
City	Nanjing
Period	15/12/17 → 17/12/17

Keywords

Curriculum learning
Delayed reward
Diversity exploration
Orientation perception
Reinforcement learning

Access to Document

10.1109/PIC.2017.8359509

Cite this

Zhang, D., Yang, L., Shi, H., Mou, F., & Hu, M. (2017). Playing games with reinforcement learning via perceiving orientation and exploring diversity. In Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017 (pp. 30-34). (Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/PIC.2017.8359509

Zhang, Dong ; Yang, Le ; Shi, Haobin et al. / Playing games with reinforcement learning via perceiving orientation and exploring diversity. Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 30-34 (Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017).

@inproceedings{ae89353cadf847da84b00918704363f3,

title = "Playing games with reinforcement learning via perceiving orientation and exploring diversity",

abstract = "The reinforcement learning can guide the agents to perform optimally under various complex environments. Although reinforcement learning has brought breakthrough for many domains, they are constrained by two bottlenecks: extremely delayed reward signal and the trade-off between diversity and speed. In this paper, we propose a novel framework to alleviate those two bottlenecks. For the delayed reward, we introduce a new term, named the orientation perception term, to calculate the award for each state. For a series of actions successfully leading to the target state, this term takes a difference to each state and assigns award to all states on the pathway, rather than only offers award to the target state. This mechanism allows the learning algorithm to percept the orientation information by distinguishing different states. For the trade-off between diversity and speed, we integrate the curriculum learning into the exploration process and propose the diversity exploration scheme. In the beginning, this scheme is prone to exploring the unexecuted action so as to discover the optimal action series. With the learning process carrying on, the scheme gradually relays more on the acquired knowledge and reduces the random probability. Such randomicity to certainty diversity exploration scheme guides the learning scheme to achieve proper balance between strategy diversity and convergency speed. We name the complete framework OpDe Reinforcement Learning and prove the algorithm convergence. Experiments on a standard platform demonstrate the effectiveness of the complete framework.",

keywords = "Curriculum learning, Delayed reward, Diversity exploration, Orientation perception, Reinforcement learning",

author = "Dong Zhang and Le Yang and Haobin Shi and Fangqing Mou and Mengkai Hu",

note = "Publisher Copyright: {\textcopyright} 2017 IEEE.; 5th International Conference on Progress in Informatics and Computing, PIC 2017 ; Conference date: 15-12-2017 Through 17-12-2017",

year = "2017",

doi = "10.1109/PIC.2017.8359509",

language = "英语",

series = "Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "30--34",

booktitle = "Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017",

}

Zhang, D, Yang, L, Shi, H, Mou, F & Hu, M 2017, Playing games with reinforcement learning via perceiving orientation and exploring diversity. in Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017. Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017, Institute of Electrical and Electronics Engineers Inc., pp. 30-34, 5th International Conference on Progress in Informatics and Computing, PIC 2017, Nanjing, China, 15/12/17. https://doi.org/10.1109/PIC.2017.8359509

Playing games with reinforcement learning via perceiving orientation and exploring diversity. / Zhang, Dong; Yang, Le; Shi, Haobin et al.
Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017. Institute of Electrical and Electronics Engineers Inc., 2017. p. 30-34 (Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Playing games with reinforcement learning via perceiving orientation and exploring diversity

AU - Zhang, Dong

AU - Yang, Le

AU - Shi, Haobin

AU - Mou, Fangqing

AU - Hu, Mengkai

PY - 2017

Y1 - 2017

N2 - The reinforcement learning can guide the agents to perform optimally under various complex environments. Although reinforcement learning has brought breakthrough for many domains, they are constrained by two bottlenecks: extremely delayed reward signal and the trade-off between diversity and speed. In this paper, we propose a novel framework to alleviate those two bottlenecks. For the delayed reward, we introduce a new term, named the orientation perception term, to calculate the award for each state. For a series of actions successfully leading to the target state, this term takes a difference to each state and assigns award to all states on the pathway, rather than only offers award to the target state. This mechanism allows the learning algorithm to percept the orientation information by distinguishing different states. For the trade-off between diversity and speed, we integrate the curriculum learning into the exploration process and propose the diversity exploration scheme. In the beginning, this scheme is prone to exploring the unexecuted action so as to discover the optimal action series. With the learning process carrying on, the scheme gradually relays more on the acquired knowledge and reduces the random probability. Such randomicity to certainty diversity exploration scheme guides the learning scheme to achieve proper balance between strategy diversity and convergency speed. We name the complete framework OpDe Reinforcement Learning and prove the algorithm convergence. Experiments on a standard platform demonstrate the effectiveness of the complete framework.

AB - The reinforcement learning can guide the agents to perform optimally under various complex environments. Although reinforcement learning has brought breakthrough for many domains, they are constrained by two bottlenecks: extremely delayed reward signal and the trade-off between diversity and speed. In this paper, we propose a novel framework to alleviate those two bottlenecks. For the delayed reward, we introduce a new term, named the orientation perception term, to calculate the award for each state. For a series of actions successfully leading to the target state, this term takes a difference to each state and assigns award to all states on the pathway, rather than only offers award to the target state. This mechanism allows the learning algorithm to percept the orientation information by distinguishing different states. For the trade-off between diversity and speed, we integrate the curriculum learning into the exploration process and propose the diversity exploration scheme. In the beginning, this scheme is prone to exploring the unexecuted action so as to discover the optimal action series. With the learning process carrying on, the scheme gradually relays more on the acquired knowledge and reduces the random probability. Such randomicity to certainty diversity exploration scheme guides the learning scheme to achieve proper balance between strategy diversity and convergency speed. We name the complete framework OpDe Reinforcement Learning and prove the algorithm convergence. Experiments on a standard platform demonstrate the effectiveness of the complete framework.

KW - Curriculum learning

KW - Delayed reward

KW - Diversity exploration

KW - Orientation perception

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85048167962&partnerID=8YFLogxK

U2 - 10.1109/PIC.2017.8359509

DO - 10.1109/PIC.2017.8359509

M3 - 会议稿件

AN - SCOPUS:85048167962

T3 - Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017

SP - 30

EP - 34

BT - Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 5th International Conference on Progress in Informatics and Computing, PIC 2017

Y2 - 15 December 2017 through 17 December 2017

ER -

Zhang D, Yang L, Shi H, Mou F, Hu M. Playing games with reinforcement learning via perceiving orientation and exploring diversity. In Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017. Institute of Electrical and Electronics Engineers Inc. 2017. p. 30-34. (Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017). doi: 10.1109/PIC.2017.8359509

Playing games with reinforcement learning via perceiving orientation and exploring diversity

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this