Collaborative Guidance Algorithm Based on Offline Pre-training and Online Reinforcement Learning

Zhenrui Lv; Yifan Hu; Zijing Tian; Bin Fu; Hongguang Ren; Wenxing Fu

doi:10.1007/978-981-96-3568-9_42

Collaborative Guidance Algorithm Based on Offline Pre-training and Online Reinforcement Learning

Zhenrui Lv, Yifan Hu, Zijing Tian, Bin Fu, Hongguang Ren, Wenxing Fu

Unmanned System Research Institute

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

In response to the common assumption of small angle relationships in existing collaborative guidance laws and the neglect of high-order terms in the remaining time expansion, this paper proposes a guidance law structure based on a combination of traditional guidance laws and collaborative correction terms, and uses reinforcement learning methods to train the correction terms. This article also constructs a guided pre training algorithm based on offline reinforcement learning algorithms, combined with the dual delay deep deterministic policy gradient algorithm. Through methods such as delayed updates and critical comparison, fast and efficient learning and training iterations are carried out, effectively solving the problem of overestimation of actions and policies in the reinforcement learning process. The simulation results show that the reinforcement learning collaborative guidance law trained by the designed framework has obvious advantages of wider applicability and higher time collaboration accuracy.

Original language	English
Title of host publication	Proceedings of 4th 2024 International Conference on Autonomous Unmanned Systems, 4th ICAUS 2024
Editors	Lianqing Liu, Yifeng Niu, Wenxing Fu, Yi Qu
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	443-453
Number of pages	11
ISBN (Print)	9789819635672
DOIs	https://doi.org/10.1007/978-981-96-3568-9_42
State	Published - 2025
Event	4th International Conference on Autonomous Unmanned Systems, ICAUS 2024 - Shenyang, China Duration: 19 Sep 2024 → 21 Sep 2024

Publication series

Name	Lecture Notes in Electrical Engineering
Volume	1377 LNEE
ISSN (Print)	1876-1100
ISSN (Electronic)	1876-1119

Conference

Conference	4th International Conference on Autonomous Unmanned Systems, ICAUS 2024
Country/Territory	China
City	Shenyang
Period	19/09/24 → 21/09/24

Keywords

Collaborative guidance
Reinforcement learning
Time collaboration

Access to Document

10.1007/978-981-96-3568-9_42

Cite this

Lv, Z., Hu, Y., Tian, Z., Fu, B., Ren, H., & Fu, W. (2025). Collaborative Guidance Algorithm Based on Offline Pre-training and Online Reinforcement Learning. In L. Liu, Y. Niu, W. Fu, & Y. Qu (Eds.), Proceedings of 4th 2024 International Conference on Autonomous Unmanned Systems, 4th ICAUS 2024 (pp. 443-453). (Lecture Notes in Electrical Engineering; Vol. 1377 LNEE). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-96-3568-9_42

Lv, Zhenrui ; Hu, Yifan ; Tian, Zijing et al. / Collaborative Guidance Algorithm Based on Offline Pre-training and Online Reinforcement Learning. Proceedings of 4th 2024 International Conference on Autonomous Unmanned Systems, 4th ICAUS 2024. editor / Lianqing Liu ; Yifeng Niu ; Wenxing Fu ; Yi Qu. Springer Science and Business Media Deutschland GmbH, 2025. pp. 443-453 (Lecture Notes in Electrical Engineering).

@inproceedings{df0f1a5fbe324abe9fc28acbfa01e268,

title = "Collaborative Guidance Algorithm Based on Offline Pre-training and Online Reinforcement Learning",

abstract = "In response to the common assumption of small angle relationships in existing collaborative guidance laws and the neglect of high-order terms in the remaining time expansion, this paper proposes a guidance law structure based on a combination of traditional guidance laws and collaborative correction terms, and uses reinforcement learning methods to train the correction terms. This article also constructs a guided pre training algorithm based on offline reinforcement learning algorithms, combined with the dual delay deep deterministic policy gradient algorithm. Through methods such as delayed updates and critical comparison, fast and efficient learning and training iterations are carried out, effectively solving the problem of overestimation of actions and policies in the reinforcement learning process. The simulation results show that the reinforcement learning collaborative guidance law trained by the designed framework has obvious advantages of wider applicability and higher time collaboration accuracy.",

keywords = "Collaborative guidance, Reinforcement learning, Time collaboration",

author = "Zhenrui Lv and Yifan Hu and Zijing Tian and Bin Fu and Hongguang Ren and Wenxing Fu",

note = "Publisher Copyright: {\textcopyright} Beijing HIWING Scientific and Technological Information Institute 2025.; 4th International Conference on Autonomous Unmanned Systems, ICAUS 2024 ; Conference date: 19-09-2024 Through 21-09-2024",

year = "2025",

doi = "10.1007/978-981-96-3568-9_42",

language = "英语",

isbn = "9789819635672",

series = "Lecture Notes in Electrical Engineering",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "443--453",

editor = "Lianqing Liu and Yifeng Niu and Wenxing Fu and Yi Qu",

booktitle = "Proceedings of 4th 2024 International Conference on Autonomous Unmanned Systems, 4th ICAUS 2024",

}

Lv, Z, Hu, Y, Tian, Z, Fu, B, Ren, H & Fu, W 2025, Collaborative Guidance Algorithm Based on Offline Pre-training and Online Reinforcement Learning. in L Liu, Y Niu, W Fu & Y Qu (eds), Proceedings of 4th 2024 International Conference on Autonomous Unmanned Systems, 4th ICAUS 2024. Lecture Notes in Electrical Engineering, vol. 1377 LNEE, Springer Science and Business Media Deutschland GmbH, pp. 443-453, 4th International Conference on Autonomous Unmanned Systems, ICAUS 2024, Shenyang, China, 19/09/24. https://doi.org/10.1007/978-981-96-3568-9_42

Collaborative Guidance Algorithm Based on Offline Pre-training and Online Reinforcement Learning. / Lv, Zhenrui; Hu, Yifan; Tian, Zijing et al.
Proceedings of 4th 2024 International Conference on Autonomous Unmanned Systems, 4th ICAUS 2024. ed. / Lianqing Liu; Yifeng Niu; Wenxing Fu; Yi Qu. Springer Science and Business Media Deutschland GmbH, 2025. p. 443-453 (Lecture Notes in Electrical Engineering; Vol. 1377 LNEE).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Collaborative Guidance Algorithm Based on Offline Pre-training and Online Reinforcement Learning

AU - Lv, Zhenrui

AU - Hu, Yifan

AU - Tian, Zijing

AU - Fu, Bin

AU - Ren, Hongguang

AU - Fu, Wenxing

N1 - Publisher Copyright: © Beijing HIWING Scientific and Technological Information Institute 2025.

PY - 2025

Y1 - 2025

N2 - In response to the common assumption of small angle relationships in existing collaborative guidance laws and the neglect of high-order terms in the remaining time expansion, this paper proposes a guidance law structure based on a combination of traditional guidance laws and collaborative correction terms, and uses reinforcement learning methods to train the correction terms. This article also constructs a guided pre training algorithm based on offline reinforcement learning algorithms, combined with the dual delay deep deterministic policy gradient algorithm. Through methods such as delayed updates and critical comparison, fast and efficient learning and training iterations are carried out, effectively solving the problem of overestimation of actions and policies in the reinforcement learning process. The simulation results show that the reinforcement learning collaborative guidance law trained by the designed framework has obvious advantages of wider applicability and higher time collaboration accuracy.

AB - In response to the common assumption of small angle relationships in existing collaborative guidance laws and the neglect of high-order terms in the remaining time expansion, this paper proposes a guidance law structure based on a combination of traditional guidance laws and collaborative correction terms, and uses reinforcement learning methods to train the correction terms. This article also constructs a guided pre training algorithm based on offline reinforcement learning algorithms, combined with the dual delay deep deterministic policy gradient algorithm. Through methods such as delayed updates and critical comparison, fast and efficient learning and training iterations are carried out, effectively solving the problem of overestimation of actions and policies in the reinforcement learning process. The simulation results show that the reinforcement learning collaborative guidance law trained by the designed framework has obvious advantages of wider applicability and higher time collaboration accuracy.

KW - Collaborative guidance

KW - Reinforcement learning

KW - Time collaboration

UR - http://www.scopus.com/inward/record.url?scp=105003303687&partnerID=8YFLogxK

U2 - 10.1007/978-981-96-3568-9_42

DO - 10.1007/978-981-96-3568-9_42

M3 - 会议稿件

AN - SCOPUS:105003303687

SN - 9789819635672

T3 - Lecture Notes in Electrical Engineering

SP - 443

EP - 453

BT - Proceedings of 4th 2024 International Conference on Autonomous Unmanned Systems, 4th ICAUS 2024

A2 - Liu, Lianqing

A2 - Niu, Yifeng

A2 - Fu, Wenxing

A2 - Qu, Yi

PB - Springer Science and Business Media Deutschland GmbH

T2 - 4th International Conference on Autonomous Unmanned Systems, ICAUS 2024

Y2 - 19 September 2024 through 21 September 2024

ER -

Lv Z, Hu Y, Tian Z, Fu B, Ren H, Fu W. Collaborative Guidance Algorithm Based on Offline Pre-training and Online Reinforcement Learning. In Liu L, Niu Y, Fu W, Qu Y, editors, Proceedings of 4th 2024 International Conference on Autonomous Unmanned Systems, 4th ICAUS 2024. Springer Science and Business Media Deutschland GmbH. 2025. p. 443-453. (Lecture Notes in Electrical Engineering). doi: 10.1007/978-981-96-3568-9_42

Collaborative Guidance Algorithm Based on Offline Pre-training and Online Reinforcement Learning

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this