TY - GEN
T1 - Scene Adaptive Persistent Target Tracking and Attack Method Based on Deep Reinforcement Learning
AU - Hao, Zhaotie
AU - Guo, Bin
AU - Li, Mengyuan
AU - Wu, Lie
AU - Yu, Zhiwen
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
PY - 2023
Y1 - 2023
N2 - As an intelligent device integrating a series of advanced technologies, mobile robots have been widely used in the field of defense and military affairs because of their high degree of autonomy and flexibility. They can independently track and attack dynamic targets. However, traditional tracking attack algorithms are sensitive to the changes of the external environment, and does not have mobility and expansibility, while deep reinforcement learning can adapt to different environments because of its good learning and exploration ability. In order to pursuit target accurately and robust, this paper proposes a solution based on deep reinforcement learning algorithm. In view of the low accuracy and low robustness of traditional dynamic target pursuit, this paper models the dynamic target tracking and attack problem of mobile robots as a Partially Observable Markov Decision Process (POMDP), and proposes a general-purpose end-to-end deep reinforcement learning framework based on dual agents to track and attack targets accurately in different scenarios. Aiming at the problem that it is difficult for mobile robots to accurately track targets and evade obstacles, this paper uses partial zero-sum game to improve the reward function to provide implicit guidance for attackers to pursue targets, and uses asynchronous advantage actor critic (A3C) algorithm to train models in parallel. Experiments in this paper show that the model can be transferred to different scenarios and has good generalization performance. Compared with the baseline method, the attacker’s time to successfully destroy the target is reduced by 44.7% at most in the maze scene and 40.5% at most in the block scene, which verifies the effectiveness of the proposed method. In addition, this paper analyzes the effectiveness of each structure of the model through ablation experiments, which illustrates the effectiveness and necessity of each module and provides a theoretical basis for subsequent research.
AB - As an intelligent device integrating a series of advanced technologies, mobile robots have been widely used in the field of defense and military affairs because of their high degree of autonomy and flexibility. They can independently track and attack dynamic targets. However, traditional tracking attack algorithms are sensitive to the changes of the external environment, and does not have mobility and expansibility, while deep reinforcement learning can adapt to different environments because of its good learning and exploration ability. In order to pursuit target accurately and robust, this paper proposes a solution based on deep reinforcement learning algorithm. In view of the low accuracy and low robustness of traditional dynamic target pursuit, this paper models the dynamic target tracking and attack problem of mobile robots as a Partially Observable Markov Decision Process (POMDP), and proposes a general-purpose end-to-end deep reinforcement learning framework based on dual agents to track and attack targets accurately in different scenarios. Aiming at the problem that it is difficult for mobile robots to accurately track targets and evade obstacles, this paper uses partial zero-sum game to improve the reward function to provide implicit guidance for attackers to pursue targets, and uses asynchronous advantage actor critic (A3C) algorithm to train models in parallel. Experiments in this paper show that the model can be transferred to different scenarios and has good generalization performance. Compared with the baseline method, the attacker’s time to successfully destroy the target is reduced by 44.7% at most in the maze scene and 40.5% at most in the block scene, which verifies the effectiveness of the proposed method. In addition, this paper analyzes the effectiveness of each structure of the model through ablation experiments, which illustrates the effectiveness and necessity of each module and provides a theoretical basis for subsequent research.
KW - Deep Reinforcement Learning
KW - Dual agent
KW - Partial zero-sum game
KW - Target pursuit
UR - http://www.scopus.com/inward/record.url?scp=85161153255&partnerID=8YFLogxK
U2 - 10.1007/978-981-99-2385-4_10
DO - 10.1007/978-981-99-2385-4_10
M3 - 会议稿件
AN - SCOPUS:85161153255
SN - 9789819923847
T3 - Communications in Computer and Information Science
SP - 133
EP - 147
BT - Computer Supported Cooperative Work and Social Computing - 17th CCF Conference, ChineseCSCW 2022, Revised Selected Papers
A2 - Sun, Yuqing
A2 - Lu, Tun
A2 - Guo, Yinzhang
A2 - Song, Xiaoxia
A2 - Fan, Hongfei
A2 - Liu, Dongning
A2 - Gao, Liping
A2 - Du, Bowen
PB - Springer Science and Business Media Deutschland GmbH
T2 - 17th CCF Conference on Computer Supported Cooperative Work and Social Computing, ChineseCSCW 2022
Y2 - 25 November 2022 through 27 November 2022
ER -