A Deep Reinforcement Learning-Based Cooperative Guidance Strategy Under Uncontrollable Velocity Conditions

Hao Cui; Ke Zhang; Minghu Tan; Jingyu Wang

doi:10.3390/aerospace12050411

A Deep Reinforcement Learning-Based Cooperative Guidance Strategy Under Uncontrollable Velocity Conditions

Hao Cui, Ke Zhang, Minghu Tan, Jingyu Wang

航天学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

We present a novel approach to generating a cooperative guidance strategy using deep reinforcement learning to address the challenge of cooperative multi-missile strikes under uncontrollable velocity conditions. This method employs the multi-agent proximal policy optimization (MAPPO) algorithm to construct a continuous action space framework for intelligent cooperative guidance. A heuristically reshaped reward function is designed to enhance cooperative guidance among agents, enabling effective target engagement while mitigating the low learning efficiency caused by sparse reward signals in the guidance environment. Additionally, a multi-stage curriculum learning approach is introduced to smooth agent actions, effectively reducing action oscillations arising from independent sampling in reinforcement learning. Simulation results demonstrate that the proposed deep reinforcement learning-based guidance law can successfully achieve cooperative attacks across a range of randomized initial conditions.

源语言	英语
文章编号	411
期刊	Aerospace
卷	12
期	5
DOI	https://doi.org/10.3390/aerospace12050411
出版状态	已出版 - 5月 2025

访问文件

10.3390/aerospace12050411

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{4b1beb9ef177460290bf46e6c22dee93,

title = "A Deep Reinforcement Learning-Based Cooperative Guidance Strategy Under Uncontrollable Velocity Conditions",

abstract = "We present a novel approach to generating a cooperative guidance strategy using deep reinforcement learning to address the challenge of cooperative multi-missile strikes under uncontrollable velocity conditions. This method employs the multi-agent proximal policy optimization (MAPPO) algorithm to construct a continuous action space framework for intelligent cooperative guidance. A heuristically reshaped reward function is designed to enhance cooperative guidance among agents, enabling effective target engagement while mitigating the low learning efficiency caused by sparse reward signals in the guidance environment. Additionally, a multi-stage curriculum learning approach is introduced to smooth agent actions, effectively reducing action oscillations arising from independent sampling in reinforcement learning. Simulation results demonstrate that the proposed deep reinforcement learning-based guidance law can successfully achieve cooperative attacks across a range of randomized initial conditions.",

keywords = "cooperative guidance, curriculum learning, MAPPO, uncontrollable velocity",

author = "Hao Cui and Ke Zhang and Minghu Tan and Jingyu Wang",

note = "Publisher Copyright: {\textcopyright} 2025 by the authors.",

year = "2025",

month = may,

doi = "10.3390/aerospace12050411",

language = "英语",

volume = "12",

journal = "Aerospace",

issn = "2226-4310",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "5",

}

TY - JOUR

T1 - A Deep Reinforcement Learning-Based Cooperative Guidance Strategy Under Uncontrollable Velocity Conditions

AU - Cui, Hao

AU - Zhang, Ke

AU - Tan, Minghu

AU - Wang, Jingyu

PY - 2025/5

Y1 - 2025/5

N2 - We present a novel approach to generating a cooperative guidance strategy using deep reinforcement learning to address the challenge of cooperative multi-missile strikes under uncontrollable velocity conditions. This method employs the multi-agent proximal policy optimization (MAPPO) algorithm to construct a continuous action space framework for intelligent cooperative guidance. A heuristically reshaped reward function is designed to enhance cooperative guidance among agents, enabling effective target engagement while mitigating the low learning efficiency caused by sparse reward signals in the guidance environment. Additionally, a multi-stage curriculum learning approach is introduced to smooth agent actions, effectively reducing action oscillations arising from independent sampling in reinforcement learning. Simulation results demonstrate that the proposed deep reinforcement learning-based guidance law can successfully achieve cooperative attacks across a range of randomized initial conditions.

AB - We present a novel approach to generating a cooperative guidance strategy using deep reinforcement learning to address the challenge of cooperative multi-missile strikes under uncontrollable velocity conditions. This method employs the multi-agent proximal policy optimization (MAPPO) algorithm to construct a continuous action space framework for intelligent cooperative guidance. A heuristically reshaped reward function is designed to enhance cooperative guidance among agents, enabling effective target engagement while mitigating the low learning efficiency caused by sparse reward signals in the guidance environment. Additionally, a multi-stage curriculum learning approach is introduced to smooth agent actions, effectively reducing action oscillations arising from independent sampling in reinforcement learning. Simulation results demonstrate that the proposed deep reinforcement learning-based guidance law can successfully achieve cooperative attacks across a range of randomized initial conditions.

KW - cooperative guidance

KW - curriculum learning

KW - MAPPO

KW - uncontrollable velocity

UR - http://www.scopus.com/inward/record.url?scp=105006452582&partnerID=8YFLogxK

U2 - 10.3390/aerospace12050411

DO - 10.3390/aerospace12050411

M3 - 文章

AN - SCOPUS:105006452582

SN - 2226-4310

VL - 12

JO - Aerospace

JF - Aerospace

IS - 5

M1 - 411

ER -

A Deep Reinforcement Learning-Based Cooperative Guidance Strategy Under Uncontrollable Velocity Conditions

摘要

访问文件

其它文件与链接

指纹

引用此