PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations

Tong Sang; Hongyao Tang; Yi Ma; Jianye Hao; Yan Zheng; Zhaopeng Meng; Boyan Li; Zhen Wang

doi:10.24963/ijcai.2022/474

PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations

Tong Sang, Hongyao Tang, Yi Ma, Jianye Hao, Yan Zheng, Zhaopeng Meng, Boyan Li, Zhen Wang

School of Cybersecurity

Tianjin University

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

3 Scopus citations

Abstract

Deep Reinforcement Learning (DRL) has been a promising solution to many complex decision-making problems. Nevertheless, the notorious weakness in generalization among environments prevent widespread application of DRL agents in real-world scenarios. Although advances have been made recently, most prior works assume sufficient online interaction on training environments, which can be costly in practical cases. To this end, we focus on an offline-training-online-adaptation setting, in which the agent first learns from offline experiences collected in environments with different dynamics and then performs online policy adaptation in environments with new dynamics. In this paper, we propose Policy Adaptation with Decoupled Representations (PAnDR) for fast policy adaptation. In offline training phase, the environment representation and policy representation are learned through contrastive learning and policy recovery, respectively. The representations are further refined by mutual information optimization to make them more decoupled and complete. With learned representations, a Policy-Dynamics Value Function (PDVF) [Raileanu et al., 2020] network is trained to approximate the values for different combinations of policies and environments from offline experiences. In online adaptation phase, with the environment context inferred from few experiences collected in new environments, the policy is optimized by gradient ascent with respect to the PDVF. Our experiments show that PAnDR outperforms existing algorithms in several representative policy adaptation problems.

Original language	English
Title of host publication	Proceedings of the 31st International Joint Conference on Artificial Intelligence, IJCAI 2022
Editors	Luc De Raedt, Luc De Raedt
Publisher	International Joint Conferences on Artificial Intelligence
Pages	3416-3422
Number of pages	7
ISBN (Electronic)	9781956792003
DOIs	https://doi.org/10.24963/ijcai.2022/474
State	Published - 2022
Event	31st International Joint Conference on Artificial Intelligence, IJCAI 2022 - Vienna, Austria Duration: 23 Jul 2022 → 29 Jul 2022

Publication series

Name	IJCAI International Joint Conference on Artificial Intelligence
ISSN (Print)	1045-0823

Conference

Conference	31st International Joint Conference on Artificial Intelligence, IJCAI 2022
Country/Territory	Austria
City	Vienna
Period	23/07/22 → 29/07/22

Access to Document

10.24963/ijcai.2022/474

Cite this

Sang, T., Tang, H., Ma, Y., Hao, J., Zheng, Y., Meng, Z., Li, B., & Wang, Z. (2022). PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations. In L. De Raedt, & L. De Raedt (Eds.), Proceedings of the 31st International Joint Conference on Artificial Intelligence, IJCAI 2022 (pp. 3416-3422). (IJCAI International Joint Conference on Artificial Intelligence). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2022/474

Sang, Tong ; Tang, Hongyao ; Ma, Yi et al. / PAnDR : Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations. Proceedings of the 31st International Joint Conference on Artificial Intelligence, IJCAI 2022. editor / Luc De Raedt ; Luc De Raedt. International Joint Conferences on Artificial Intelligence, 2022. pp. 3416-3422 (IJCAI International Joint Conference on Artificial Intelligence).

@inproceedings{5fdad22092654529a7a584c5264cf902,

title = "PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations",

abstract = "Deep Reinforcement Learning (DRL) has been a promising solution to many complex decision-making problems. Nevertheless, the notorious weakness in generalization among environments prevent widespread application of DRL agents in real-world scenarios. Although advances have been made recently, most prior works assume sufficient online interaction on training environments, which can be costly in practical cases. To this end, we focus on an offline-training-online-adaptation setting, in which the agent first learns from offline experiences collected in environments with different dynamics and then performs online policy adaptation in environments with new dynamics. In this paper, we propose Policy Adaptation with Decoupled Representations (PAnDR) for fast policy adaptation. In offline training phase, the environment representation and policy representation are learned through contrastive learning and policy recovery, respectively. The representations are further refined by mutual information optimization to make them more decoupled and complete. With learned representations, a Policy-Dynamics Value Function (PDVF) [Raileanu et al., 2020] network is trained to approximate the values for different combinations of policies and environments from offline experiences. In online adaptation phase, with the environment context inferred from few experiences collected in new environments, the policy is optimized by gradient ascent with respect to the PDVF. Our experiments show that PAnDR outperforms existing algorithms in several representative policy adaptation problems.",

author = "Tong Sang and Hongyao Tang and Yi Ma and Jianye Hao and Yan Zheng and Zhaopeng Meng and Boyan Li and Zhen Wang",

note = "Publisher Copyright: {\textcopyright} 2022 International Joint Conferences on Artificial Intelligence. All rights reserved.; 31st International Joint Conference on Artificial Intelligence, IJCAI 2022 ; Conference date: 23-07-2022 Through 29-07-2022",

year = "2022",

doi = "10.24963/ijcai.2022/474",

language = "英语",

series = "IJCAI International Joint Conference on Artificial Intelligence",

publisher = "International Joint Conferences on Artificial Intelligence",

pages = "3416--3422",

editor = "{De Raedt}, Luc and {De Raedt}, Luc",

booktitle = "Proceedings of the 31st International Joint Conference on Artificial Intelligence, IJCAI 2022",

}

Sang, T, Tang, H, Ma, Y, Hao, J, Zheng, Y, Meng, Z, Li, B & Wang, Z 2022, PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations. in L De Raedt & L De Raedt (eds), Proceedings of the 31st International Joint Conference on Artificial Intelligence, IJCAI 2022. IJCAI International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence, pp. 3416-3422, 31st International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23/07/22. https://doi.org/10.24963/ijcai.2022/474

PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations. / Sang, Tong; Tang, Hongyao; Ma, Yi et al.
Proceedings of the 31st International Joint Conference on Artificial Intelligence, IJCAI 2022. ed. / Luc De Raedt; Luc De Raedt. International Joint Conferences on Artificial Intelligence, 2022. p. 3416-3422 (IJCAI International Joint Conference on Artificial Intelligence).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - PAnDR

T2 - 31st International Joint Conference on Artificial Intelligence, IJCAI 2022

AU - Sang, Tong

AU - Tang, Hongyao

AU - Ma, Yi

AU - Hao, Jianye

AU - Zheng, Yan

AU - Meng, Zhaopeng

AU - Li, Boyan

AU - Wang, Zhen

PY - 2022

Y1 - 2022

N2 - Deep Reinforcement Learning (DRL) has been a promising solution to many complex decision-making problems. Nevertheless, the notorious weakness in generalization among environments prevent widespread application of DRL agents in real-world scenarios. Although advances have been made recently, most prior works assume sufficient online interaction on training environments, which can be costly in practical cases. To this end, we focus on an offline-training-online-adaptation setting, in which the agent first learns from offline experiences collected in environments with different dynamics and then performs online policy adaptation in environments with new dynamics. In this paper, we propose Policy Adaptation with Decoupled Representations (PAnDR) for fast policy adaptation. In offline training phase, the environment representation and policy representation are learned through contrastive learning and policy recovery, respectively. The representations are further refined by mutual information optimization to make them more decoupled and complete. With learned representations, a Policy-Dynamics Value Function (PDVF) [Raileanu et al., 2020] network is trained to approximate the values for different combinations of policies and environments from offline experiences. In online adaptation phase, with the environment context inferred from few experiences collected in new environments, the policy is optimized by gradient ascent with respect to the PDVF. Our experiments show that PAnDR outperforms existing algorithms in several representative policy adaptation problems.

AB - Deep Reinforcement Learning (DRL) has been a promising solution to many complex decision-making problems. Nevertheless, the notorious weakness in generalization among environments prevent widespread application of DRL agents in real-world scenarios. Although advances have been made recently, most prior works assume sufficient online interaction on training environments, which can be costly in practical cases. To this end, we focus on an offline-training-online-adaptation setting, in which the agent first learns from offline experiences collected in environments with different dynamics and then performs online policy adaptation in environments with new dynamics. In this paper, we propose Policy Adaptation with Decoupled Representations (PAnDR) for fast policy adaptation. In offline training phase, the environment representation and policy representation are learned through contrastive learning and policy recovery, respectively. The representations are further refined by mutual information optimization to make them more decoupled and complete. With learned representations, a Policy-Dynamics Value Function (PDVF) [Raileanu et al., 2020] network is trained to approximate the values for different combinations of policies and environments from offline experiences. In online adaptation phase, with the environment context inferred from few experiences collected in new environments, the policy is optimized by gradient ascent with respect to the PDVF. Our experiments show that PAnDR outperforms existing algorithms in several representative policy adaptation problems.

UR - http://www.scopus.com/inward/record.url?scp=85137879187&partnerID=8YFLogxK

U2 - 10.24963/ijcai.2022/474

DO - 10.24963/ijcai.2022/474

M3 - 会议稿件

AN - SCOPUS:85137879187

T3 - IJCAI International Joint Conference on Artificial Intelligence

SP - 3416

EP - 3422

BT - Proceedings of the 31st International Joint Conference on Artificial Intelligence, IJCAI 2022

A2 - De Raedt, Luc

PB - International Joint Conferences on Artificial Intelligence

Y2 - 23 July 2022 through 29 July 2022

ER -

Sang T, Tang H, Ma Y, Hao J, Zheng Y, Meng Z et al. PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations. In De Raedt L, De Raedt L, editors, Proceedings of the 31st International Joint Conference on Artificial Intelligence, IJCAI 2022. International Joint Conferences on Artificial Intelligence. 2022. p. 3416-3422. (IJCAI International Joint Conference on Artificial Intelligence). doi: 10.24963/ijcai.2022/474

PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this