TY - JOUR
T1 - A behavior fusion method based on inverse reinforcement learning
AU - Shi, Haobin
AU - Li, Jingchen
AU - Chen, Shicong
AU - Hwang, Kao Shing
N1 - Publisher Copyright:
© 2022
PY - 2022/9
Y1 - 2022/9
N2 - Inverse reinforcement learning (IRL) is usually used in deep reinforcement learning systems for tasks that are difficult to design with manual reward functions. If the task is too complicated, the expert sample trajectories obtained artificially often have different preferences, resulting in a relatively large variance in the learned reward function. For this purpose, this study proposes a behavior fusion method based on adversarial IRL. We decompose complex tasks into several simple subtasks according to different preferences. After decoupling the tasks, we use the inherent relationship between IRL and generative adversarial network (GAN): the discriminator network fits the reward function and the generator network fits strategy, and the reward function and policy are learned respectively. Moreover, we improve the adversarial IRL model by using multiple discriminators to correspond to each subtask, and provide a more efficient update for the whole structure. The behavior fusion in this work acts a weighted network on the reward functions in different subtasks. The proposed method is evaluated on Atari enduro racing game with baseline methods, and we implement a wafer inspection experiment for further discussions. The experimental results show our method can learn more advanced policies in complicated tasks, and the training process is more stable.
AB - Inverse reinforcement learning (IRL) is usually used in deep reinforcement learning systems for tasks that are difficult to design with manual reward functions. If the task is too complicated, the expert sample trajectories obtained artificially often have different preferences, resulting in a relatively large variance in the learned reward function. For this purpose, this study proposes a behavior fusion method based on adversarial IRL. We decompose complex tasks into several simple subtasks according to different preferences. After decoupling the tasks, we use the inherent relationship between IRL and generative adversarial network (GAN): the discriminator network fits the reward function and the generator network fits strategy, and the reward function and policy are learned respectively. Moreover, we improve the adversarial IRL model by using multiple discriminators to correspond to each subtask, and provide a more efficient update for the whole structure. The behavior fusion in this work acts a weighted network on the reward functions in different subtasks. The proposed method is evaluated on Atari enduro racing game with baseline methods, and we implement a wafer inspection experiment for further discussions. The experimental results show our method can learn more advanced policies in complicated tasks, and the training process is more stable.
KW - Behavior fusion
KW - Generative adversarial network
KW - Inverse reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85135715724&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2022.07.100
DO - 10.1016/j.ins.2022.07.100
M3 - 文章
AN - SCOPUS:85135715724
SN - 0020-0255
VL - 609
SP - 429
EP - 444
JO - Information Sciences
JF - Information Sciences
ER -