SCRL: Self-supervised Continual Reinforcement Learning for Domain Adaptation

Yuyang Fang; Bin Guo; Jiaqi Liu; Kaixing Zhao; Yasan Ding; Na Wang; Zhiwen Yu

doi:10.1109/AIoTSys58602.2023.00030

SCRL: Self-supervised Continual Reinforcement Learning for Domain Adaptation

Yuyang Fang, Bin Guo, Jiaqi Liu, Kaixing Zhao, Yasan Ding, Na Wang, Zhiwen Yu

计算机学院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

1 引用（Scopus）

摘要

Deep reinforcement learning outperforms humans in many tasks. However, in reality, the scenarios of agent deployment often change, and the performance of models may therefore degrade, resulting in unreasonable decisions in new scenarios. Therefore, it is important for the model to own the evolution ability in new scenarios. Although recent attempts have studied the continual training of models through reward signals or self-supervised learning, on the one hand, previously used reward signals in the target domain are generally difficult to obtain and on the other hand, the performance of self-supervised evolution is always limited. In addition, adapting to the target domain usually results in forgetting the source domain. To overcome the above problems, in this paper, we propose a self-supervised continual reinforcement learning method, called SCRL, which combines reinforcement learning tasks with self-supervised learning auxiliary tasks and weight regularizes. Specifically, in the source domain, we first jointly train different objectives of reinforcement and self-supervised learning, which enables to share the same encoder. The Fisher information matrix is then calculated to record the importance of the model parameters to the source domain. In the target domain, the encoder is used for the reinforcement learning task and is updated by self-supervised learning under the control of the Fisher regularizer. Our extensive experiments on four continuous control tasks from the DeepMind Control suite showed that in the absence of reward signals, the proposed SCRL can effectively adapt the model to the target domain without catastrophically forgetting the source domain.

源语言	英语
主期刊名	Proceedings - 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023
出版商	Institute of Electrical and Electronics Engineers Inc.
页	55-63
页数	9
ISBN（电子版）	9798350312270
DOI	https://doi.org/10.1109/AIoTSys58602.2023.00030
出版状态	已出版 - 2023
活动	2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023 - Xi�an, 中国期限: 19 10月 2023 → 22 10月 2023

出版系列

姓名	Proceedings - 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023

会议

会议	2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023
国家/地区	中国
市	Xi�an
时期	19/10/23 → 22/10/23

访问文件

10.1109/AIoTSys58602.2023.00030

其它文件与链接

链接到 Scopus 的出版物

引用此

Fang, Y., Guo, B., Liu, J., Zhao, K., Ding, Y., Wang, N., & Yu, Z. (2023). SCRL: Self-supervised Continual Reinforcement Learning for Domain Adaptation. 在 Proceedings - 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023 (页码 55-63). (Proceedings - 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/AIoTSys58602.2023.00030

Fang, Yuyang ; Guo, Bin ; Liu, Jiaqi 等. / SCRL : Self-supervised Continual Reinforcement Learning for Domain Adaptation. Proceedings - 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 55-63 (Proceedings - 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023).

@inproceedings{20289fba98bb484f89c02ab0528f32c6,

title = "SCRL: Self-supervised Continual Reinforcement Learning for Domain Adaptation",

abstract = "Deep reinforcement learning outperforms humans in many tasks. However, in reality, the scenarios of agent deployment often change, and the performance of models may therefore degrade, resulting in unreasonable decisions in new scenarios. Therefore, it is important for the model to own the evolution ability in new scenarios. Although recent attempts have studied the continual training of models through reward signals or self-supervised learning, on the one hand, previously used reward signals in the target domain are generally difficult to obtain and on the other hand, the performance of self-supervised evolution is always limited. In addition, adapting to the target domain usually results in forgetting the source domain. To overcome the above problems, in this paper, we propose a self-supervised continual reinforcement learning method, called SCRL, which combines reinforcement learning tasks with self-supervised learning auxiliary tasks and weight regularizes. Specifically, in the source domain, we first jointly train different objectives of reinforcement and self-supervised learning, which enables to share the same encoder. The Fisher information matrix is then calculated to record the importance of the model parameters to the source domain. In the target domain, the encoder is used for the reinforcement learning task and is updated by self-supervised learning under the control of the Fisher regularizer. Our extensive experiments on four continuous control tasks from the DeepMind Control suite showed that in the absence of reward signals, the proposed SCRL can effectively adapt the model to the target domain without catastrophically forgetting the source domain.",

keywords = "continual learning, domain adaptation, Fisher regularizer, reinforcement learning, self-supervised learning",

author = "Yuyang Fang and Bin Guo and Jiaqi Liu and Kaixing Zhao and Yasan Ding and Na Wang and Zhiwen Yu",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023 ; Conference date: 19-10-2023 Through 22-10-2023",

year = "2023",

doi = "10.1109/AIoTSys58602.2023.00030",

language = "英语",

series = "Proceedings - 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "55--63",

booktitle = "Proceedings - 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023",

}

Fang, Y, Guo, B, Liu, J, Zhao, K, Ding, Y, Wang, N & Yu, Z 2023, SCRL: Self-supervised Continual Reinforcement Learning for Domain Adaptation. 在 Proceedings - 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023. Proceedings - 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023, Institute of Electrical and Electronics Engineers Inc., 页码 55-63, 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023, Xi�an, 中国, 19/10/23. https://doi.org/10.1109/AIoTSys58602.2023.00030

SCRL: Self-supervised Continual Reinforcement Learning for Domain Adaptation. / Fang, Yuyang; Guo, Bin; Liu, Jiaqi 等.
Proceedings - 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 55-63 (Proceedings - 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - SCRL

T2 - 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023

AU - Fang, Yuyang

AU - Guo, Bin

AU - Liu, Jiaqi

AU - Zhao, Kaixing

AU - Ding, Yasan

AU - Wang, Na

AU - Yu, Zhiwen

PY - 2023

Y1 - 2023

N2 - Deep reinforcement learning outperforms humans in many tasks. However, in reality, the scenarios of agent deployment often change, and the performance of models may therefore degrade, resulting in unreasonable decisions in new scenarios. Therefore, it is important for the model to own the evolution ability in new scenarios. Although recent attempts have studied the continual training of models through reward signals or self-supervised learning, on the one hand, previously used reward signals in the target domain are generally difficult to obtain and on the other hand, the performance of self-supervised evolution is always limited. In addition, adapting to the target domain usually results in forgetting the source domain. To overcome the above problems, in this paper, we propose a self-supervised continual reinforcement learning method, called SCRL, which combines reinforcement learning tasks with self-supervised learning auxiliary tasks and weight regularizes. Specifically, in the source domain, we first jointly train different objectives of reinforcement and self-supervised learning, which enables to share the same encoder. The Fisher information matrix is then calculated to record the importance of the model parameters to the source domain. In the target domain, the encoder is used for the reinforcement learning task and is updated by self-supervised learning under the control of the Fisher regularizer. Our extensive experiments on four continuous control tasks from the DeepMind Control suite showed that in the absence of reward signals, the proposed SCRL can effectively adapt the model to the target domain without catastrophically forgetting the source domain.

AB - Deep reinforcement learning outperforms humans in many tasks. However, in reality, the scenarios of agent deployment often change, and the performance of models may therefore degrade, resulting in unreasonable decisions in new scenarios. Therefore, it is important for the model to own the evolution ability in new scenarios. Although recent attempts have studied the continual training of models through reward signals or self-supervised learning, on the one hand, previously used reward signals in the target domain are generally difficult to obtain and on the other hand, the performance of self-supervised evolution is always limited. In addition, adapting to the target domain usually results in forgetting the source domain. To overcome the above problems, in this paper, we propose a self-supervised continual reinforcement learning method, called SCRL, which combines reinforcement learning tasks with self-supervised learning auxiliary tasks and weight regularizes. Specifically, in the source domain, we first jointly train different objectives of reinforcement and self-supervised learning, which enables to share the same encoder. The Fisher information matrix is then calculated to record the importance of the model parameters to the source domain. In the target domain, the encoder is used for the reinforcement learning task and is updated by self-supervised learning under the control of the Fisher regularizer. Our extensive experiments on four continuous control tasks from the DeepMind Control suite showed that in the absence of reward signals, the proposed SCRL can effectively adapt the model to the target domain without catastrophically forgetting the source domain.

KW - continual learning

KW - domain adaptation

KW - Fisher regularizer

KW - reinforcement learning

KW - self-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85188249202&partnerID=8YFLogxK

U2 - 10.1109/AIoTSys58602.2023.00030

DO - 10.1109/AIoTSys58602.2023.00030

M3 - 会议稿件

AN - SCOPUS:85188249202

T3 - Proceedings - 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023

SP - 55

EP - 63

BT - Proceedings - 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 19 October 2023 through 22 October 2023

ER -

Fang Y, Guo B, Liu J, Zhao K, Ding Y, Wang N 等. SCRL: Self-supervised Continual Reinforcement Learning for Domain Adaptation. 在 Proceedings - 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023. Institute of Electrical and Electronics Engineers Inc. 2023. 页码 55-63. (Proceedings - 2023 International Conference on Artificial Intelligence of Things and Systems, AIoTSys 2023). doi: 10.1109/AIoTSys58602.2023.00030

SCRL: Self-supervised Continual Reinforcement Learning for Domain Adaptation

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此