TY - JOUR
T1 - On the Value of Myopic Behavior in Policy Reuse
AU - Bai, Chenjia
AU - Xu, Kang
AU - Qiu, Shuang
AU - He, Haoran
AU - Zhao, Bin
AU - Wang, Zhen
AU - Li, Wei
AU - Li, Xuelong
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence. In reinforcement learning, rationally reusing the policies acquired from other tasks or human experts is critical for tackling problems that are difficult to learn from scratch. In this work, we present a framework called Selective Myopic bEhavior Control (SMEC), which results from the insight that the short-term behaviors of prior policies are sharable across tasks. By evaluating the behaviors of prior policies via a hybrid value function architecture, SMEC adaptively aggregates the sharable short-term behaviors of prior policies and the long-term behaviors of the task policy, leading to coordinated decisions. Empirical results on a collection of manipulation and locomotion tasks demonstrate that SMEC outperforms existing methods, and validate the ability of SMEC to leverage related prior policies.
AB - Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence. In reinforcement learning, rationally reusing the policies acquired from other tasks or human experts is critical for tackling problems that are difficult to learn from scratch. In this work, we present a framework called Selective Myopic bEhavior Control (SMEC), which results from the insight that the short-term behaviors of prior policies are sharable across tasks. By evaluating the behaviors of prior policies via a hybrid value function architecture, SMEC adaptively aggregates the sharable short-term behaviors of prior policies and the long-term behaviors of the task policy, leading to coordinated decisions. Empirical results on a collection of manipulation and locomotion tasks demonstrate that SMEC outperforms existing methods, and validate the ability of SMEC to leverage related prior policies.
KW - Reinforcement learning
KW - policy generalization
KW - policy reuse
UR - https://www.scopus.com/pages/publications/105003380853
U2 - 10.1109/TPAMI.2025.3560628
DO - 10.1109/TPAMI.2025.3560628
M3 - 文章
AN - SCOPUS:105003380853
SN - 0162-8828
VL - 47
SP - 6647
EP - 6659
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 8
ER -