TY - JOUR
T1 - Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play
AU - Sun, Zhixiao
AU - Piao, Haiyin
AU - Yang, Zhen
AU - Zhao, Yiyang
AU - Zhan, Guang
AU - Zhou, Deyun
AU - Meng, Guanglei
AU - Chen, Hechang
AU - Chen, Xing
AU - Qu, Bohao
AU - Lu, Yuanjie
N1 - Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2021/2
Y1 - 2021/2
N2 - Air-to-air confrontation has attracted wide attention from artificial intelligence scholars. However, in the complex air combat process, operational strategy selection depends heavily on aviation expert knowledge, which is usually expensive and difficult to obtain. Moreover, it is challenging to select optimal action sequences efficiently and accurately with existing methods, due to the high complexity of action selection when involving hybrid actions, e.g., discrete/continuous actions. In view of this, we propose a novel Multi-Agent Hierarchical Policy Gradient algorithm (MAHPG), which is capable of learning various strategies and transcending expert cognition by adversarial self-play learning. Besides, a hierarchical decision network is adopted to deal with the complicated and hybrid actions. It has a hierarchical decision-making ability similar to humankind, and thus, reduces the action ambiguity efficiently. Extensive experimental results demonstrate that the MAHPG outperforms the state-of-the-art air combat methods in terms of both defense and offense ability. Notably, it is discovered that the MAHPG has the ability of Air Combat Tactics Interplay Adaptation, and new operational strategies emerged that surpass the level of experts.
AB - Air-to-air confrontation has attracted wide attention from artificial intelligence scholars. However, in the complex air combat process, operational strategy selection depends heavily on aviation expert knowledge, which is usually expensive and difficult to obtain. Moreover, it is challenging to select optimal action sequences efficiently and accurately with existing methods, due to the high complexity of action selection when involving hybrid actions, e.g., discrete/continuous actions. In view of this, we propose a novel Multi-Agent Hierarchical Policy Gradient algorithm (MAHPG), which is capable of learning various strategies and transcending expert cognition by adversarial self-play learning. Besides, a hierarchical decision network is adopted to deal with the complicated and hybrid actions. It has a hierarchical decision-making ability similar to humankind, and thus, reduces the action ambiguity efficiently. Extensive experimental results demonstrate that the MAHPG outperforms the state-of-the-art air combat methods in terms of both defense and offense ability. Notably, it is discovered that the MAHPG has the ability of Air Combat Tactics Interplay Adaptation, and new operational strategies emerged that surpass the level of experts.
KW - Air combat
KW - Artificial intelligence
KW - Multi-agent reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85097635836&partnerID=8YFLogxK
U2 - 10.1016/j.engappai.2020.104112
DO - 10.1016/j.engappai.2020.104112
M3 - 文章
AN - SCOPUS:85097635836
SN - 0952-1976
VL - 98
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 104112
ER -