Behavior fusion for deep reinforcement learning

Haobin Shi, Meng Xu, Kao Shing Hwang, Bo Yin Cai

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

For deep reinforcement learning (DRL) system, it is difficult to design a reward function for complex tasks, so this paper proposes a framework of behavior fusion for the actor–critic architecture, which learns the policy based on an advantage function that consists of two value functions. Firstly, the proposed method decomposes a complex task into several sub-tasks, and merges the trained policies for those sub-tasks into a unified policy for the complex task, instead of designing a new reward function and training for the policy. Each sub-task is trained individually by an actor–critic algorithm using a simple reward function. These pre-trained sub-tasks are building blocks that are used to rapidly assemble a rapid prototype of a complicated task. Secondly, the proposed method integrates modules in the calculation of the policy gradient by calculating the accumulated returns to reduce variation. Thirdly, two alternative methods to acquire integrated returns for the complicated task are also proposed. The Atari 2600 pong game and a wafer probe task are used to validate the performance of the proposed methods by comparison with the method using a gate network.

Original languageEnglish
Pages (from-to)434-444
Number of pages11
JournalISA Transactions
Volume98
DOIs
StatePublished - Mar 2020

Keywords

  • Actor–critic
  • Behavior fusion
  • Complex task
  • Deep reinforcement learning
  • Policy gradient

Fingerprint

Dive into the research topics of 'Behavior fusion for deep reinforcement learning'. Together they form a unique fingerprint.

Cite this