An ensemble method for inverse reinforcement learning

Jin Ling Lin; Kao Shing Hwang; Haobin Shi; Wei Pan

doi:10.1016/j.ins.2019.09.066

An ensemble method for inverse reinforcement learning

Jin Ling Lin, Kao Shing Hwang, Haobin Shi, Wei Pan

School of Computer Science

Research output: Contribution to journal › Article › peer-review

17 Scopus citations

Abstract

In inverse reinforcement learning (IRL), a reward function is learnt to generalize experts’ behavior. This paper proposes a model-free IRL algorithm based on an ensemble method, where the reward function is regarded as a parametric function of expected features. In other words, the parameters are updated based on a weak classification method. The IRL is formulated as a problem of a boosting classifier, akin to the renowned Adaboost algorithm for classification, feature expectations from experts’ demonstration, and the trajectory induced by an agent's current policy. The proposed approach takes individual feature expectation as attractor or expeller, depending on the sign of the residuals of the state trajectories between expert's demonstration and the one induced by RL with the currently approximated reward function, so as to tackle its central challenges of accurate inference, generalizability, and correctness of prior knowledge. Then, the proposed method is applied further to approximate an abstract reward function from observations of more complex behavior composed of several basic actions. The results of the simulations in a labyrinth are shown to validate the proposed algorithm. Furthermore, behaviors composed of a set of primitive actions on a soccer robot field are examined for the applicability of the proposed method.

Original language	English
Pages (from-to)	518-532
Number of pages	15
Journal	Information Sciences
Volume	512
DOIs	https://doi.org/10.1016/j.ins.2019.09.066
State	Published - Feb 2020

Keywords

Apprentice learning
Boosting classifier
Inverse reinforcement learning
Q-learning

Access to Document

10.1016/j.ins.2019.09.066

Cite this

@article{790eb01a70d64fe599c588d984193e66,

title = "An ensemble method for inverse reinforcement learning",

abstract = "In inverse reinforcement learning (IRL), a reward function is learnt to generalize experts{\textquoteright} behavior. This paper proposes a model-free IRL algorithm based on an ensemble method, where the reward function is regarded as a parametric function of expected features. In other words, the parameters are updated based on a weak classification method. The IRL is formulated as a problem of a boosting classifier, akin to the renowned Adaboost algorithm for classification, feature expectations from experts{\textquoteright} demonstration, and the trajectory induced by an agent's current policy. The proposed approach takes individual feature expectation as attractor or expeller, depending on the sign of the residuals of the state trajectories between expert's demonstration and the one induced by RL with the currently approximated reward function, so as to tackle its central challenges of accurate inference, generalizability, and correctness of prior knowledge. Then, the proposed method is applied further to approximate an abstract reward function from observations of more complex behavior composed of several basic actions. The results of the simulations in a labyrinth are shown to validate the proposed algorithm. Furthermore, behaviors composed of a set of primitive actions on a soccer robot field are examined for the applicability of the proposed method.",

keywords = "Apprentice learning, Boosting classifier, Inverse reinforcement learning, Q-learning",

author = "Lin, {Jin Ling} and Hwang, {Kao Shing} and Haobin Shi and Wei Pan",

note = "Publisher Copyright: {\textcopyright} 2019",

year = "2020",

month = feb,

doi = "10.1016/j.ins.2019.09.066",

language = "英语",

volume = "512",

pages = "518--532",

journal = "Information Sciences",

issn = "0020-0255",

publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - An ensemble method for inverse reinforcement learning

AU - Lin, Jin Ling

AU - Hwang, Kao Shing

AU - Shi, Haobin

AU - Pan, Wei

PY - 2020/2

Y1 - 2020/2

N2 - In inverse reinforcement learning (IRL), a reward function is learnt to generalize experts’ behavior. This paper proposes a model-free IRL algorithm based on an ensemble method, where the reward function is regarded as a parametric function of expected features. In other words, the parameters are updated based on a weak classification method. The IRL is formulated as a problem of a boosting classifier, akin to the renowned Adaboost algorithm for classification, feature expectations from experts’ demonstration, and the trajectory induced by an agent's current policy. The proposed approach takes individual feature expectation as attractor or expeller, depending on the sign of the residuals of the state trajectories between expert's demonstration and the one induced by RL with the currently approximated reward function, so as to tackle its central challenges of accurate inference, generalizability, and correctness of prior knowledge. Then, the proposed method is applied further to approximate an abstract reward function from observations of more complex behavior composed of several basic actions. The results of the simulations in a labyrinth are shown to validate the proposed algorithm. Furthermore, behaviors composed of a set of primitive actions on a soccer robot field are examined for the applicability of the proposed method.

AB - In inverse reinforcement learning (IRL), a reward function is learnt to generalize experts’ behavior. This paper proposes a model-free IRL algorithm based on an ensemble method, where the reward function is regarded as a parametric function of expected features. In other words, the parameters are updated based on a weak classification method. The IRL is formulated as a problem of a boosting classifier, akin to the renowned Adaboost algorithm for classification, feature expectations from experts’ demonstration, and the trajectory induced by an agent's current policy. The proposed approach takes individual feature expectation as attractor or expeller, depending on the sign of the residuals of the state trajectories between expert's demonstration and the one induced by RL with the currently approximated reward function, so as to tackle its central challenges of accurate inference, generalizability, and correctness of prior knowledge. Then, the proposed method is applied further to approximate an abstract reward function from observations of more complex behavior composed of several basic actions. The results of the simulations in a labyrinth are shown to validate the proposed algorithm. Furthermore, behaviors composed of a set of primitive actions on a soccer robot field are examined for the applicability of the proposed method.

KW - Apprentice learning

KW - Boosting classifier

KW - Inverse reinforcement learning

KW - Q-learning

UR - http://www.scopus.com/inward/record.url?scp=85073051334&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2019.09.066

DO - 10.1016/j.ins.2019.09.066

M3 - 文章

AN - SCOPUS:85073051334

SN - 0020-0255

VL - 512

SP - 518

EP - 532

JO - Information Sciences

JF - Information Sciences

ER -

An ensemble method for inverse reinforcement learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this