Playing games with reinforcement learning via perceiving orientation and exploring diversity

Dong Zhang; Le Yang; Haobin Shi; Fangqing Mou; Mengkai Hu

doi:10.1109/PIC.2017.8359509

Playing games with reinforcement learning via perceiving orientation and exploring diversity

Dong Zhang, Le Yang, Haobin Shi, Fangqing Mou, Mengkai Hu

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

The reinforcement learning can guide the agents to perform optimally under various complex environments. Although reinforcement learning has brought breakthrough for many domains, they are constrained by two bottlenecks: extremely delayed reward signal and the trade-off between diversity and speed. In this paper, we propose a novel framework to alleviate those two bottlenecks. For the delayed reward, we introduce a new term, named the orientation perception term, to calculate the award for each state. For a series of actions successfully leading to the target state, this term takes a difference to each state and assigns award to all states on the pathway, rather than only offers award to the target state. This mechanism allows the learning algorithm to percept the orientation information by distinguishing different states. For the trade-off between diversity and speed, we integrate the curriculum learning into the exploration process and propose the diversity exploration scheme. In the beginning, this scheme is prone to exploring the unexecuted action so as to discover the optimal action series. With the learning process carrying on, the scheme gradually relays more on the acquired knowledge and reduces the random probability. Such randomicity to certainty diversity exploration scheme guides the learning scheme to achieve proper balance between strategy diversity and convergency speed. We name the complete framework OpDe Reinforcement Learning and prove the algorithm convergence. Experiments on a standard platform demonstrate the effectiveness of the complete framework.

源语言	英语
主期刊名	Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017
出版商	Institute of Electrical and Electronics Engineers Inc.
页	30-34
页数	5
ISBN（电子版）	9781538619773
DOI	https://doi.org/10.1109/PIC.2017.8359509
出版状态	已出版 - 2017
活动	5th International Conference on Progress in Informatics and Computing, PIC 2017 - Nanjing, 中国期限: 15 12月 2017 → 17 12月 2017

出版系列

姓名	Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017

会议

会议	5th International Conference on Progress in Informatics and Computing, PIC 2017
国家/地区	中国
市	Nanjing
时期	15/12/17 → 17/12/17

访问文件

10.1109/PIC.2017.8359509

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, D., Yang, L., Shi, H., Mou, F., & Hu, M. (2017). Playing games with reinforcement learning via perceiving orientation and exploring diversity. 在 Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017 (页码 30-34). (Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/PIC.2017.8359509

Zhang, Dong ; Yang, Le ; Shi, Haobin 等. / Playing games with reinforcement learning via perceiving orientation and exploring diversity. Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017. Institute of Electrical and Electronics Engineers Inc., 2017. 页码 30-34 (Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017).

@inproceedings{ae89353cadf847da84b00918704363f3,

title = "Playing games with reinforcement learning via perceiving orientation and exploring diversity",

abstract = "The reinforcement learning can guide the agents to perform optimally under various complex environments. Although reinforcement learning has brought breakthrough for many domains, they are constrained by two bottlenecks: extremely delayed reward signal and the trade-off between diversity and speed. In this paper, we propose a novel framework to alleviate those two bottlenecks. For the delayed reward, we introduce a new term, named the orientation perception term, to calculate the award for each state. For a series of actions successfully leading to the target state, this term takes a difference to each state and assigns award to all states on the pathway, rather than only offers award to the target state. This mechanism allows the learning algorithm to percept the orientation information by distinguishing different states. For the trade-off between diversity and speed, we integrate the curriculum learning into the exploration process and propose the diversity exploration scheme. In the beginning, this scheme is prone to exploring the unexecuted action so as to discover the optimal action series. With the learning process carrying on, the scheme gradually relays more on the acquired knowledge and reduces the random probability. Such randomicity to certainty diversity exploration scheme guides the learning scheme to achieve proper balance between strategy diversity and convergency speed. We name the complete framework OpDe Reinforcement Learning and prove the algorithm convergence. Experiments on a standard platform demonstrate the effectiveness of the complete framework.",

keywords = "Curriculum learning, Delayed reward, Diversity exploration, Orientation perception, Reinforcement learning",

author = "Dong Zhang and Le Yang and Haobin Shi and Fangqing Mou and Mengkai Hu",

note = "Publisher Copyright: {\textcopyright} 2017 IEEE.; 5th International Conference on Progress in Informatics and Computing, PIC 2017 ; Conference date: 15-12-2017 Through 17-12-2017",

year = "2017",

doi = "10.1109/PIC.2017.8359509",

language = "英语",

series = "Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "30--34",

booktitle = "Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017",

}

Zhang, D, Yang, L, Shi, H, Mou, F & Hu, M 2017, Playing games with reinforcement learning via perceiving orientation and exploring diversity. 在 Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017. Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017, Institute of Electrical and Electronics Engineers Inc., 页码 30-34, 5th International Conference on Progress in Informatics and Computing, PIC 2017, Nanjing, 中国, 15/12/17. https://doi.org/10.1109/PIC.2017.8359509

Playing games with reinforcement learning via perceiving orientation and exploring diversity. / Zhang, Dong; Yang, Le; Shi, Haobin 等.
Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017. Institute of Electrical and Electronics Engineers Inc., 2017. 页码 30-34 (Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Playing games with reinforcement learning via perceiving orientation and exploring diversity

AU - Zhang, Dong

AU - Yang, Le

AU - Shi, Haobin

AU - Mou, Fangqing

AU - Hu, Mengkai

PY - 2017

Y1 - 2017

N2 - The reinforcement learning can guide the agents to perform optimally under various complex environments. Although reinforcement learning has brought breakthrough for many domains, they are constrained by two bottlenecks: extremely delayed reward signal and the trade-off between diversity and speed. In this paper, we propose a novel framework to alleviate those two bottlenecks. For the delayed reward, we introduce a new term, named the orientation perception term, to calculate the award for each state. For a series of actions successfully leading to the target state, this term takes a difference to each state and assigns award to all states on the pathway, rather than only offers award to the target state. This mechanism allows the learning algorithm to percept the orientation information by distinguishing different states. For the trade-off between diversity and speed, we integrate the curriculum learning into the exploration process and propose the diversity exploration scheme. In the beginning, this scheme is prone to exploring the unexecuted action so as to discover the optimal action series. With the learning process carrying on, the scheme gradually relays more on the acquired knowledge and reduces the random probability. Such randomicity to certainty diversity exploration scheme guides the learning scheme to achieve proper balance between strategy diversity and convergency speed. We name the complete framework OpDe Reinforcement Learning and prove the algorithm convergence. Experiments on a standard platform demonstrate the effectiveness of the complete framework.

AB - The reinforcement learning can guide the agents to perform optimally under various complex environments. Although reinforcement learning has brought breakthrough for many domains, they are constrained by two bottlenecks: extremely delayed reward signal and the trade-off between diversity and speed. In this paper, we propose a novel framework to alleviate those two bottlenecks. For the delayed reward, we introduce a new term, named the orientation perception term, to calculate the award for each state. For a series of actions successfully leading to the target state, this term takes a difference to each state and assigns award to all states on the pathway, rather than only offers award to the target state. This mechanism allows the learning algorithm to percept the orientation information by distinguishing different states. For the trade-off between diversity and speed, we integrate the curriculum learning into the exploration process and propose the diversity exploration scheme. In the beginning, this scheme is prone to exploring the unexecuted action so as to discover the optimal action series. With the learning process carrying on, the scheme gradually relays more on the acquired knowledge and reduces the random probability. Such randomicity to certainty diversity exploration scheme guides the learning scheme to achieve proper balance between strategy diversity and convergency speed. We name the complete framework OpDe Reinforcement Learning and prove the algorithm convergence. Experiments on a standard platform demonstrate the effectiveness of the complete framework.

KW - Curriculum learning

KW - Delayed reward

KW - Diversity exploration

KW - Orientation perception

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85048167962&partnerID=8YFLogxK

U2 - 10.1109/PIC.2017.8359509

DO - 10.1109/PIC.2017.8359509

M3 - 会议稿件

AN - SCOPUS:85048167962

T3 - Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017

SP - 30

EP - 34

BT - Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 5th International Conference on Progress in Informatics and Computing, PIC 2017

Y2 - 15 December 2017 through 17 December 2017

ER -

Zhang D, Yang L, Shi H, Mou F, Hu M. Playing games with reinforcement learning via perceiving orientation and exploring diversity. 在 Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017. Institute of Electrical and Electronics Engineers Inc. 2017. 页码 30-34. (Proceedings of 2017 International Conference on Progress in Informatics and Computing, PIC 2017). doi: 10.1109/PIC.2017.8359509

Playing games with reinforcement learning via perceiving orientation and exploring diversity

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此