A behavior fusion method based on inverse reinforcement learning

Haobin Shi; Jingchen Li; Shicong Chen; Kao Shing Hwang

doi:10.1016/j.ins.2022.07.100

A behavior fusion method based on inverse reinforcement learning

Haobin Shi, Jingchen Li, Shicong Chen, Kao Shing Hwang

School of Computer Science

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

Inverse reinforcement learning (IRL) is usually used in deep reinforcement learning systems for tasks that are difficult to design with manual reward functions. If the task is too complicated, the expert sample trajectories obtained artificially often have different preferences, resulting in a relatively large variance in the learned reward function. For this purpose, this study proposes a behavior fusion method based on adversarial IRL. We decompose complex tasks into several simple subtasks according to different preferences. After decoupling the tasks, we use the inherent relationship between IRL and generative adversarial network (GAN): the discriminator network fits the reward function and the generator network fits strategy, and the reward function and policy are learned respectively. Moreover, we improve the adversarial IRL model by using multiple discriminators to correspond to each subtask, and provide a more efficient update for the whole structure. The behavior fusion in this work acts a weighted network on the reward functions in different subtasks. The proposed method is evaluated on Atari enduro racing game with baseline methods, and we implement a wafer inspection experiment for further discussions. The experimental results show our method can learn more advanced policies in complicated tasks, and the training process is more stable.

Original language	English
Pages (from-to)	429-444
Number of pages	16
Journal	Information Sciences
Volume	609
DOIs	https://doi.org/10.1016/j.ins.2022.07.100
State	Published - Sep 2022

Keywords

Behavior fusion
Generative adversarial network
Inverse reinforcement learning

Access to Document

10.1016/j.ins.2022.07.100

Cite this

@article{b3d334cf5fb54f978d16d1f451f05874,

title = "A behavior fusion method based on inverse reinforcement learning",

abstract = "Inverse reinforcement learning (IRL) is usually used in deep reinforcement learning systems for tasks that are difficult to design with manual reward functions. If the task is too complicated, the expert sample trajectories obtained artificially often have different preferences, resulting in a relatively large variance in the learned reward function. For this purpose, this study proposes a behavior fusion method based on adversarial IRL. We decompose complex tasks into several simple subtasks according to different preferences. After decoupling the tasks, we use the inherent relationship between IRL and generative adversarial network (GAN): the discriminator network fits the reward function and the generator network fits strategy, and the reward function and policy are learned respectively. Moreover, we improve the adversarial IRL model by using multiple discriminators to correspond to each subtask, and provide a more efficient update for the whole structure. The behavior fusion in this work acts a weighted network on the reward functions in different subtasks. The proposed method is evaluated on Atari enduro racing game with baseline methods, and we implement a wafer inspection experiment for further discussions. The experimental results show our method can learn more advanced policies in complicated tasks, and the training process is more stable.",

keywords = "Behavior fusion, Generative adversarial network, Inverse reinforcement learning",

author = "Haobin Shi and Jingchen Li and Shicong Chen and Hwang, {Kao Shing}",

note = "Publisher Copyright: {\textcopyright} 2022",

year = "2022",

month = sep,

doi = "10.1016/j.ins.2022.07.100",

language = "英语",

volume = "609",

pages = "429--444",

journal = "Information Sciences",

issn = "0020-0255",

publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - A behavior fusion method based on inverse reinforcement learning

AU - Shi, Haobin

AU - Li, Jingchen

AU - Chen, Shicong

AU - Hwang, Kao Shing

PY - 2022/9

Y1 - 2022/9

N2 - Inverse reinforcement learning (IRL) is usually used in deep reinforcement learning systems for tasks that are difficult to design with manual reward functions. If the task is too complicated, the expert sample trajectories obtained artificially often have different preferences, resulting in a relatively large variance in the learned reward function. For this purpose, this study proposes a behavior fusion method based on adversarial IRL. We decompose complex tasks into several simple subtasks according to different preferences. After decoupling the tasks, we use the inherent relationship between IRL and generative adversarial network (GAN): the discriminator network fits the reward function and the generator network fits strategy, and the reward function and policy are learned respectively. Moreover, we improve the adversarial IRL model by using multiple discriminators to correspond to each subtask, and provide a more efficient update for the whole structure. The behavior fusion in this work acts a weighted network on the reward functions in different subtasks. The proposed method is evaluated on Atari enduro racing game with baseline methods, and we implement a wafer inspection experiment for further discussions. The experimental results show our method can learn more advanced policies in complicated tasks, and the training process is more stable.

AB - Inverse reinforcement learning (IRL) is usually used in deep reinforcement learning systems for tasks that are difficult to design with manual reward functions. If the task is too complicated, the expert sample trajectories obtained artificially often have different preferences, resulting in a relatively large variance in the learned reward function. For this purpose, this study proposes a behavior fusion method based on adversarial IRL. We decompose complex tasks into several simple subtasks according to different preferences. After decoupling the tasks, we use the inherent relationship between IRL and generative adversarial network (GAN): the discriminator network fits the reward function and the generator network fits strategy, and the reward function and policy are learned respectively. Moreover, we improve the adversarial IRL model by using multiple discriminators to correspond to each subtask, and provide a more efficient update for the whole structure. The behavior fusion in this work acts a weighted network on the reward functions in different subtasks. The proposed method is evaluated on Atari enduro racing game with baseline methods, and we implement a wafer inspection experiment for further discussions. The experimental results show our method can learn more advanced policies in complicated tasks, and the training process is more stable.

KW - Behavior fusion

KW - Generative adversarial network

KW - Inverse reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85135715724&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2022.07.100

DO - 10.1016/j.ins.2022.07.100

M3 - 文章

AN - SCOPUS:85135715724

SN - 0020-0255

VL - 609

SP - 429

EP - 444

JO - Information Sciences

JF - Information Sciences

ER -

A behavior fusion method based on inverse reinforcement learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this