Reinforcement learning-based missile terminal guidance of maneuvering targets with decoys

Tianbo DENG; Hao HUANG; Yangwang FANG; Jie YAN; Haoyu CHENG

doi:10.1016/j.cja.2023.05.028

Reinforcement learning-based missile terminal guidance of maneuvering targets with decoys

Tianbo DENG, Hao HUANG, Yangwang FANG, Jie YAN, Haoyu CHENG

School of Astronautics

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

12 Scopus citations

Abstract

In this paper, a missile terminal guidance law based on a new Deep Deterministic Policy Gradient (DDPG) algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy. First, to deal with the issue that the missile cannot accurately distinguish the target from the decoy, the energy center method is employed to obtain the equivalent energy center (called virtual target) of the target and decoy, and the model for the missile and the virtual decoy is established. Then, an improved DDPG algorithm is proposed based on a trusted-search strategy, which significantly increases the train efficiency of the previous DDPG algorithm. Furthermore, combining the established model, the network obtained by the improved DDPG algorithm and the reward function, an intelligent missile terminal guidance scheme is proposed. Specifically, a heuristic reward function is designed for training and learning in combat scenarios. Finally, the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests, and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance.

Original language	English
Pages (from-to)	309-324
Number of pages	16
Journal	Chinese Journal of Aeronautics
Volume	36
Issue number	12
DOIs	https://doi.org/10.1016/j.cja.2023.05.028
State	Published - Dec 2023

Keywords

Deep deterministic policy gradient
Infrared decoy
Maneuvering target
Reinforcement learning
Terminal guidance law

Access to Document

10.1016/j.cja.2023.05.028

Cite this

@article{e0cfb1f5a00f420c8da9e334d3740f50,

title = "Reinforcement learning-based missile terminal guidance of maneuvering targets with decoys",

abstract = "In this paper, a missile terminal guidance law based on a new Deep Deterministic Policy Gradient (DDPG) algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy. First, to deal with the issue that the missile cannot accurately distinguish the target from the decoy, the energy center method is employed to obtain the equivalent energy center (called virtual target) of the target and decoy, and the model for the missile and the virtual decoy is established. Then, an improved DDPG algorithm is proposed based on a trusted-search strategy, which significantly increases the train efficiency of the previous DDPG algorithm. Furthermore, combining the established model, the network obtained by the improved DDPG algorithm and the reward function, an intelligent missile terminal guidance scheme is proposed. Specifically, a heuristic reward function is designed for training and learning in combat scenarios. Finally, the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests, and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance.",

keywords = "Deep deterministic policy gradient, Infrared decoy, Maneuvering target, Reinforcement learning, Terminal guidance law",

author = "Tianbo DENG and Hao HUANG and Yangwang FANG and Jie YAN and Haoyu CHENG",

note = "Publisher Copyright: {\textcopyright} 2023",

year = "2023",

month = dec,

doi = "10.1016/j.cja.2023.05.028",

language = "英语",

volume = "36",

pages = "309--324",

journal = "Chinese Journal of Aeronautics",

issn = "1000-9361",

publisher = "Elsevier B.V.",

number = "12",

}

TY - JOUR

T1 - Reinforcement learning-based missile terminal guidance of maneuvering targets with decoys

AU - DENG, Tianbo

AU - HUANG, Hao

AU - FANG, Yangwang

AU - YAN, Jie

AU - CHENG, Haoyu

PY - 2023/12

Y1 - 2023/12

N2 - In this paper, a missile terminal guidance law based on a new Deep Deterministic Policy Gradient (DDPG) algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy. First, to deal with the issue that the missile cannot accurately distinguish the target from the decoy, the energy center method is employed to obtain the equivalent energy center (called virtual target) of the target and decoy, and the model for the missile and the virtual decoy is established. Then, an improved DDPG algorithm is proposed based on a trusted-search strategy, which significantly increases the train efficiency of the previous DDPG algorithm. Furthermore, combining the established model, the network obtained by the improved DDPG algorithm and the reward function, an intelligent missile terminal guidance scheme is proposed. Specifically, a heuristic reward function is designed for training and learning in combat scenarios. Finally, the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests, and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance.

AB - In this paper, a missile terminal guidance law based on a new Deep Deterministic Policy Gradient (DDPG) algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy. First, to deal with the issue that the missile cannot accurately distinguish the target from the decoy, the energy center method is employed to obtain the equivalent energy center (called virtual target) of the target and decoy, and the model for the missile and the virtual decoy is established. Then, an improved DDPG algorithm is proposed based on a trusted-search strategy, which significantly increases the train efficiency of the previous DDPG algorithm. Furthermore, combining the established model, the network obtained by the improved DDPG algorithm and the reward function, an intelligent missile terminal guidance scheme is proposed. Specifically, a heuristic reward function is designed for training and learning in combat scenarios. Finally, the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests, and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance.

KW - Deep deterministic policy gradient

KW - Infrared decoy

KW - Maneuvering target

KW - Reinforcement learning

KW - Terminal guidance law

UR - http://www.scopus.com/inward/record.url?scp=85175612884&partnerID=8YFLogxK

U2 - 10.1016/j.cja.2023.05.028

DO - 10.1016/j.cja.2023.05.028

M3 - 文章

AN - SCOPUS:85175612884

SN - 1000-9361

VL - 36

SP - 309

EP - 324

JO - Chinese Journal of Aeronautics

JF - Chinese Journal of Aeronautics

IS - 12

ER -

Reinforcement learning-based missile terminal guidance of maneuvering targets with decoys

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this