Reinforcement learning-based missile terminal guidance of maneuvering targets with decoys

Tianbo DENG; Hao HUANG; Yangwang FANG; Jie YAN; Haoyu CHENG

doi:10.1016/j.cja.2023.05.028

Reinforcement learning-based missile terminal guidance of maneuvering targets with decoys

Tianbo DENG, Hao HUANG, Yangwang FANG, Jie YAN, Haoyu CHENG

航天学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

9 引用（Scopus）

摘要

In this paper, a missile terminal guidance law based on a new Deep Deterministic Policy Gradient (DDPG) algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy. First, to deal with the issue that the missile cannot accurately distinguish the target from the decoy, the energy center method is employed to obtain the equivalent energy center (called virtual target) of the target and decoy, and the model for the missile and the virtual decoy is established. Then, an improved DDPG algorithm is proposed based on a trusted-search strategy, which significantly increases the train efficiency of the previous DDPG algorithm. Furthermore, combining the established model, the network obtained by the improved DDPG algorithm and the reward function, an intelligent missile terminal guidance scheme is proposed. Specifically, a heuristic reward function is designed for training and learning in combat scenarios. Finally, the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests, and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance.

源语言	英语
页（从-至）	309-324
页数	16
期刊	Chinese Journal of Aeronautics
卷	36
期	12
DOI	https://doi.org/10.1016/j.cja.2023.05.028
出版状态	已出版 - 12月 2023

访问文件

10.1016/j.cja.2023.05.028

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{e0cfb1f5a00f420c8da9e334d3740f50,

title = "Reinforcement learning-based missile terminal guidance of maneuvering targets with decoys",

abstract = "In this paper, a missile terminal guidance law based on a new Deep Deterministic Policy Gradient (DDPG) algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy. First, to deal with the issue that the missile cannot accurately distinguish the target from the decoy, the energy center method is employed to obtain the equivalent energy center (called virtual target) of the target and decoy, and the model for the missile and the virtual decoy is established. Then, an improved DDPG algorithm is proposed based on a trusted-search strategy, which significantly increases the train efficiency of the previous DDPG algorithm. Furthermore, combining the established model, the network obtained by the improved DDPG algorithm and the reward function, an intelligent missile terminal guidance scheme is proposed. Specifically, a heuristic reward function is designed for training and learning in combat scenarios. Finally, the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests, and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance.",

keywords = "Deep deterministic policy gradient, Infrared decoy, Maneuvering target, Reinforcement learning, Terminal guidance law",

author = "Tianbo DENG and Hao HUANG and Yangwang FANG and Jie YAN and Haoyu CHENG",

note = "Publisher Copyright: {\textcopyright} 2023",

year = "2023",

month = dec,

doi = "10.1016/j.cja.2023.05.028",

language = "英语",

volume = "36",

pages = "309--324",

journal = "Chinese Journal of Aeronautics",

issn = "1000-9361",

publisher = "Elsevier B.V.",

number = "12",

}

TY - JOUR

T1 - Reinforcement learning-based missile terminal guidance of maneuvering targets with decoys

AU - DENG, Tianbo

AU - HUANG, Hao

AU - FANG, Yangwang

AU - YAN, Jie

AU - CHENG, Haoyu

PY - 2023/12

Y1 - 2023/12

N2 - In this paper, a missile terminal guidance law based on a new Deep Deterministic Policy Gradient (DDPG) algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy. First, to deal with the issue that the missile cannot accurately distinguish the target from the decoy, the energy center method is employed to obtain the equivalent energy center (called virtual target) of the target and decoy, and the model for the missile and the virtual decoy is established. Then, an improved DDPG algorithm is proposed based on a trusted-search strategy, which significantly increases the train efficiency of the previous DDPG algorithm. Furthermore, combining the established model, the network obtained by the improved DDPG algorithm and the reward function, an intelligent missile terminal guidance scheme is proposed. Specifically, a heuristic reward function is designed for training and learning in combat scenarios. Finally, the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests, and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance.

AB - In this paper, a missile terminal guidance law based on a new Deep Deterministic Policy Gradient (DDPG) algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy. First, to deal with the issue that the missile cannot accurately distinguish the target from the decoy, the energy center method is employed to obtain the equivalent energy center (called virtual target) of the target and decoy, and the model for the missile and the virtual decoy is established. Then, an improved DDPG algorithm is proposed based on a trusted-search strategy, which significantly increases the train efficiency of the previous DDPG algorithm. Furthermore, combining the established model, the network obtained by the improved DDPG algorithm and the reward function, an intelligent missile terminal guidance scheme is proposed. Specifically, a heuristic reward function is designed for training and learning in combat scenarios. Finally, the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests, and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance.

KW - Deep deterministic policy gradient

KW - Infrared decoy

KW - Maneuvering target

KW - Reinforcement learning

KW - Terminal guidance law

UR - http://www.scopus.com/inward/record.url?scp=85175612884&partnerID=8YFLogxK

U2 - 10.1016/j.cja.2023.05.028

DO - 10.1016/j.cja.2023.05.028

M3 - 文章

AN - SCOPUS:85175612884

SN - 1000-9361

VL - 36

SP - 309

EP - 324

JO - Chinese Journal of Aeronautics

JF - Chinese Journal of Aeronautics

IS - 12

ER -

Reinforcement learning-based missile terminal guidance of maneuvering targets with decoys

摘要

访问文件

其它文件与链接

指纹

引用此