On the Value of Myopic Behavior in Policy Reuse

Chenjia Bai; Kang Xu; Shuang Qiu; Haoran He; Bin Zhao; Zhen Wang; Wei Li; Xuelong Li

doi:10.1109/TPAMI.2025.3560628

On the Value of Myopic Behavior in Policy Reuse

Chenjia Bai, Kang Xu, Shuang Qiu, Haoran He, Bin Zhao, Zhen Wang, Wei Li, Xuelong Li

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence. In reinforcement learning, rationally reusing the policies acquired from other tasks or human experts is critical for tackling problems that are difficult to learn from scratch. In this work, we present a framework called Selective Myopic bEhavior Control (SMEC), which results from the insight that the short-term behaviors of prior policies are sharable across tasks. By evaluating the behaviors of prior policies via a hybrid value function architecture, SMEC adaptively aggregates the sharable short-term behaviors of prior policies and the long-term behaviors of the task policy, leading to coordinated decisions. Empirical results on a collection of manipulation and locomotion tasks demonstrate that SMEC outperforms existing methods, and validate the ability of SMEC to leverage related prior policies.

源语言	英语
期刊	IEEE Transactions on Pattern Analysis and Machine Intelligence
DOI	https://doi.org/10.1109/TPAMI.2025.3560628
出版状态	已接受/待刊 - 2025

访问文件

10.1109/TPAMI.2025.3560628

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{4025d92c6f104e62879276d0a63b619b,

title = "On the Value of Myopic Behavior in Policy Reuse",

abstract = "Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence. In reinforcement learning, rationally reusing the policies acquired from other tasks or human experts is critical for tackling problems that are difficult to learn from scratch. In this work, we present a framework called Selective Myopic bEhavior Control (SMEC), which results from the insight that the short-term behaviors of prior policies are sharable across tasks. By evaluating the behaviors of prior policies via a hybrid value function architecture, SMEC adaptively aggregates the sharable short-term behaviors of prior policies and the long-term behaviors of the task policy, leading to coordinated decisions. Empirical results on a collection of manipulation and locomotion tasks demonstrate that SMEC outperforms existing methods, and validate the ability of SMEC to leverage related prior policies.",

keywords = "Policy Generalization, Policy Reuse, Reinforcement Learning",

author = "Chenjia Bai and Kang Xu and Shuang Qiu and Haoran He and Bin Zhao and Zhen Wang and Wei Li and Xuelong Li",

year = "2025",

doi = "10.1109/TPAMI.2025.3560628",

language = "英语",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - On the Value of Myopic Behavior in Policy Reuse

AU - Bai, Chenjia

AU - Xu, Kang

AU - Qiu, Shuang

AU - He, Haoran

AU - Zhao, Bin

AU - Wang, Zhen

AU - Li, Wei

AU - Li, Xuelong

PY - 2025

Y1 - 2025

N2 - Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence. In reinforcement learning, rationally reusing the policies acquired from other tasks or human experts is critical for tackling problems that are difficult to learn from scratch. In this work, we present a framework called Selective Myopic bEhavior Control (SMEC), which results from the insight that the short-term behaviors of prior policies are sharable across tasks. By evaluating the behaviors of prior policies via a hybrid value function architecture, SMEC adaptively aggregates the sharable short-term behaviors of prior policies and the long-term behaviors of the task policy, leading to coordinated decisions. Empirical results on a collection of manipulation and locomotion tasks demonstrate that SMEC outperforms existing methods, and validate the ability of SMEC to leverage related prior policies.

AB - Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence. In reinforcement learning, rationally reusing the policies acquired from other tasks or human experts is critical for tackling problems that are difficult to learn from scratch. In this work, we present a framework called Selective Myopic bEhavior Control (SMEC), which results from the insight that the short-term behaviors of prior policies are sharable across tasks. By evaluating the behaviors of prior policies via a hybrid value function architecture, SMEC adaptively aggregates the sharable short-term behaviors of prior policies and the long-term behaviors of the task policy, leading to coordinated decisions. Empirical results on a collection of manipulation and locomotion tasks demonstrate that SMEC outperforms existing methods, and validate the ability of SMEC to leverage related prior policies.

KW - Policy Generalization

KW - Policy Reuse

KW - Reinforcement Learning

UR - http://www.scopus.com/inward/record.url?scp=105003380853&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2025.3560628

DO - 10.1109/TPAMI.2025.3560628

M3 - 文章

AN - SCOPUS:105003380853

SN - 0162-8828

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

ER -

On the Value of Myopic Behavior in Policy Reuse

摘要

访问文件

其它文件与链接

指纹

引用此