Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play

Zhixiao Sun; Haiyin Piao; Zhen Yang; Yiyang Zhao; Guang Zhan; Deyun Zhou; Guanglei Meng; Hechang Chen; Xing Chen; Bohao Qu; Yuanjie Lu

doi:10.1016/j.engappai.2020.104112

Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play

Zhixiao Sun, Haiyin Piao, Zhen Yang, Yiyang Zhao, Guang Zhan, Deyun Zhou, Guanglei Meng, Hechang Chen, Xing Chen, Bohao Qu, Yuanjie Lu

School of Electronics and Information

Research output: Contribution to journal › Article › peer-review

102 Scopus citations

Abstract

Air-to-air confrontation has attracted wide attention from artificial intelligence scholars. However, in the complex air combat process, operational strategy selection depends heavily on aviation expert knowledge, which is usually expensive and difficult to obtain. Moreover, it is challenging to select optimal action sequences efficiently and accurately with existing methods, due to the high complexity of action selection when involving hybrid actions, e.g., discrete/continuous actions. In view of this, we propose a novel Multi-Agent Hierarchical Policy Gradient algorithm (MAHPG), which is capable of learning various strategies and transcending expert cognition by adversarial self-play learning. Besides, a hierarchical decision network is adopted to deal with the complicated and hybrid actions. It has a hierarchical decision-making ability similar to humankind, and thus, reduces the action ambiguity efficiently. Extensive experimental results demonstrate that the MAHPG outperforms the state-of-the-art air combat methods in terms of both defense and offense ability. Notably, it is discovered that the MAHPG has the ability of Air Combat Tactics Interplay Adaptation, and new operational strategies emerged that surpass the level of experts.

Original language	English
Article number	104112
Journal	Engineering Applications of Artificial Intelligence
Volume	98
DOIs	https://doi.org/10.1016/j.engappai.2020.104112
State	Published - Feb 2021

Keywords

Air combat
Artificial intelligence
Multi-agent reinforcement learning

Access to Document

10.1016/j.engappai.2020.104112

Cite this

@article{c4bf64326c244301a84f2af7b459dd86,

title = "Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play",

abstract = "Air-to-air confrontation has attracted wide attention from artificial intelligence scholars. However, in the complex air combat process, operational strategy selection depends heavily on aviation expert knowledge, which is usually expensive and difficult to obtain. Moreover, it is challenging to select optimal action sequences efficiently and accurately with existing methods, due to the high complexity of action selection when involving hybrid actions, e.g., discrete/continuous actions. In view of this, we propose a novel Multi-Agent Hierarchical Policy Gradient algorithm (MAHPG), which is capable of learning various strategies and transcending expert cognition by adversarial self-play learning. Besides, a hierarchical decision network is adopted to deal with the complicated and hybrid actions. It has a hierarchical decision-making ability similar to humankind, and thus, reduces the action ambiguity efficiently. Extensive experimental results demonstrate that the MAHPG outperforms the state-of-the-art air combat methods in terms of both defense and offense ability. Notably, it is discovered that the MAHPG has the ability of Air Combat Tactics Interplay Adaptation, and new operational strategies emerged that surpass the level of experts.",

keywords = "Air combat, Artificial intelligence, Multi-agent reinforcement learning",

author = "Zhixiao Sun and Haiyin Piao and Zhen Yang and Yiyang Zhao and Guang Zhan and Deyun Zhou and Guanglei Meng and Hechang Chen and Xing Chen and Bohao Qu and Yuanjie Lu",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier Ltd",

year = "2021",

month = feb,

doi = "10.1016/j.engappai.2020.104112",

language = "英语",

volume = "98",

journal = "Engineering Applications of Artificial Intelligence",

issn = "0952-1976",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play

AU - Sun, Zhixiao

AU - Piao, Haiyin

AU - Yang, Zhen

AU - Zhao, Yiyang

AU - Zhan, Guang

AU - Zhou, Deyun

AU - Meng, Guanglei

AU - Chen, Hechang

AU - Chen, Xing

AU - Qu, Bohao

AU - Lu, Yuanjie

PY - 2021/2

Y1 - 2021/2

N2 - Air-to-air confrontation has attracted wide attention from artificial intelligence scholars. However, in the complex air combat process, operational strategy selection depends heavily on aviation expert knowledge, which is usually expensive and difficult to obtain. Moreover, it is challenging to select optimal action sequences efficiently and accurately with existing methods, due to the high complexity of action selection when involving hybrid actions, e.g., discrete/continuous actions. In view of this, we propose a novel Multi-Agent Hierarchical Policy Gradient algorithm (MAHPG), which is capable of learning various strategies and transcending expert cognition by adversarial self-play learning. Besides, a hierarchical decision network is adopted to deal with the complicated and hybrid actions. It has a hierarchical decision-making ability similar to humankind, and thus, reduces the action ambiguity efficiently. Extensive experimental results demonstrate that the MAHPG outperforms the state-of-the-art air combat methods in terms of both defense and offense ability. Notably, it is discovered that the MAHPG has the ability of Air Combat Tactics Interplay Adaptation, and new operational strategies emerged that surpass the level of experts.

AB - Air-to-air confrontation has attracted wide attention from artificial intelligence scholars. However, in the complex air combat process, operational strategy selection depends heavily on aviation expert knowledge, which is usually expensive and difficult to obtain. Moreover, it is challenging to select optimal action sequences efficiently and accurately with existing methods, due to the high complexity of action selection when involving hybrid actions, e.g., discrete/continuous actions. In view of this, we propose a novel Multi-Agent Hierarchical Policy Gradient algorithm (MAHPG), which is capable of learning various strategies and transcending expert cognition by adversarial self-play learning. Besides, a hierarchical decision network is adopted to deal with the complicated and hybrid actions. It has a hierarchical decision-making ability similar to humankind, and thus, reduces the action ambiguity efficiently. Extensive experimental results demonstrate that the MAHPG outperforms the state-of-the-art air combat methods in terms of both defense and offense ability. Notably, it is discovered that the MAHPG has the ability of Air Combat Tactics Interplay Adaptation, and new operational strategies emerged that surpass the level of experts.

KW - Air combat

KW - Artificial intelligence

KW - Multi-agent reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85097635836&partnerID=8YFLogxK

U2 - 10.1016/j.engappai.2020.104112

DO - 10.1016/j.engappai.2020.104112

M3 - 文章

AN - SCOPUS:85097635836

SN - 0952-1976

VL - 98

JO - Engineering Applications of Artificial Intelligence

JF - Engineering Applications of Artificial Intelligence

M1 - 104112

ER -

Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this