TY - JOUR
T1 - Reinforcement learning-based missile terminal guidance of maneuvering targets with decoys
AU - DENG, Tianbo
AU - HUANG, Hao
AU - FANG, Yangwang
AU - YAN, Jie
AU - CHENG, Haoyu
N1 - Publisher Copyright:
© 2023
PY - 2023/12
Y1 - 2023/12
N2 - In this paper, a missile terminal guidance law based on a new Deep Deterministic Policy Gradient (DDPG) algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy. First, to deal with the issue that the missile cannot accurately distinguish the target from the decoy, the energy center method is employed to obtain the equivalent energy center (called virtual target) of the target and decoy, and the model for the missile and the virtual decoy is established. Then, an improved DDPG algorithm is proposed based on a trusted-search strategy, which significantly increases the train efficiency of the previous DDPG algorithm. Furthermore, combining the established model, the network obtained by the improved DDPG algorithm and the reward function, an intelligent missile terminal guidance scheme is proposed. Specifically, a heuristic reward function is designed for training and learning in combat scenarios. Finally, the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests, and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance.
AB - In this paper, a missile terminal guidance law based on a new Deep Deterministic Policy Gradient (DDPG) algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy. First, to deal with the issue that the missile cannot accurately distinguish the target from the decoy, the energy center method is employed to obtain the equivalent energy center (called virtual target) of the target and decoy, and the model for the missile and the virtual decoy is established. Then, an improved DDPG algorithm is proposed based on a trusted-search strategy, which significantly increases the train efficiency of the previous DDPG algorithm. Furthermore, combining the established model, the network obtained by the improved DDPG algorithm and the reward function, an intelligent missile terminal guidance scheme is proposed. Specifically, a heuristic reward function is designed for training and learning in combat scenarios. Finally, the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests, and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance.
KW - Deep deterministic policy gradient
KW - Infrared decoy
KW - Maneuvering target
KW - Reinforcement learning
KW - Terminal guidance law
UR - http://www.scopus.com/inward/record.url?scp=85175612884&partnerID=8YFLogxK
U2 - 10.1016/j.cja.2023.05.028
DO - 10.1016/j.cja.2023.05.028
M3 - 文章
AN - SCOPUS:85175612884
SN - 1000-9361
VL - 36
SP - 309
EP - 324
JO - Chinese Journal of Aeronautics
JF - Chinese Journal of Aeronautics
IS - 12
ER -