基于知识辅助深度强化学习的巡飞弹组动态突防决策

Hao Sun; Haiqing Li; Yan Liang; Chaoxiong Ma; Han Wu

doi:10.12382/bgxb.2023.0827

基于知识辅助深度强化学习的巡飞弹组动态突防决策

Hao Sun, Haiqing Li, Yan Liang, Chaoxiong Ma, Han Wu

自动化学院

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

The loitering munition group penetration control decision (LMGPCD) is the key to improve the autonomy and intelligence of loitering munition group combat. A knowledge-assisted reinforcement learning-based LMGPCD algorithm is proposed to solve the issue due to the difficult online generation of penetration maneuver command for loitering munition group in the dynamic environment containing interceptors and air defenses. The state space and reward function are improved by domain knowledge and rule knowledge to enhance the generalization ability and training convergence speed of the algorithm. A LMGPCD decision framework based on the soft actor-critic (SAC) algorithm is constructed to increase the exploration efficiency of the algorithm. An expert experience applying and imitation learning method is utilized against the lacking of initial efficient training experience for the algorithm due to the narrow solution space caused by increasing number of missiles and threats. The experimental results show that the proposed algorithm can generate more effective penetration maneuver command in real time in a dynamic environment compared to other algorithm, which verifies the effectiveness of the proposed algorithm.

投稿的翻译标题	Dynamic Penetration Decision of Loitering Munition Group Based on Knowledge-assisted Reinforcement Learning
源语言	繁体中文
页（从-至）	3161-3176
页数	16
期刊	Binggong Xuebao/Acta Armamentarii
卷	45
期	9
DOI	https://doi.org/10.12382/bgxb.2023.0827
出版状态	已出版 - 30 9月 2024

关键词

control decision
dynamic environment penetration
knowledge-assisted deep reinforcement learning
loitering munition group
soft actor-critic algorithm

访问文件

10.12382/bgxb.2023.0827

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{d9ed482d0a4843419381a3adab71c492,

title = "基于知识辅助深度强化学习的巡飞弹组动态突防决策",

abstract = "The loitering munition group penetration control decision (LMGPCD) is the key to improve the autonomy and intelligence of loitering munition group combat. A knowledge-assisted reinforcement learning-based LMGPCD algorithm is proposed to solve the issue due to the difficult online generation of penetration maneuver command for loitering munition group in the dynamic environment containing interceptors and air defenses. The state space and reward function are improved by domain knowledge and rule knowledge to enhance the generalization ability and training convergence speed of the algorithm. A LMGPCD decision framework based on the soft actor-critic (SAC) algorithm is constructed to increase the exploration efficiency of the algorithm. An expert experience applying and imitation learning method is utilized against the lacking of initial efficient training experience for the algorithm due to the narrow solution space caused by increasing number of missiles and threats. The experimental results show that the proposed algorithm can generate more effective penetration maneuver command in real time in a dynamic environment compared to other algorithm, which verifies the effectiveness of the proposed algorithm.",

keywords = "control decision, dynamic environment penetration, knowledge-assisted deep reinforcement learning, loitering munition group, soft actor-critic algorithm",

author = "Hao Sun and Haiqing Li and Yan Liang and Chaoxiong Ma and Han Wu",

year = "2024",

month = sep,

day = "30",

doi = "10.12382/bgxb.2023.0827",

language = "繁体中文",

volume = "45",

pages = "3161--3176",

journal = "Binggong Xuebao/Acta Armamentarii",

issn = "1000-1093",

publisher = "China Ordnance Society",

number = "9",

}

TY - JOUR

T1 - 基于知识辅助深度强化学习的巡飞弹组动态突防决策

AU - Sun, Hao

AU - Li, Haiqing

AU - Liang, Yan

AU - Ma, Chaoxiong

AU - Wu, Han

PY - 2024/9/30

Y1 - 2024/9/30

N2 - The loitering munition group penetration control decision (LMGPCD) is the key to improve the autonomy and intelligence of loitering munition group combat. A knowledge-assisted reinforcement learning-based LMGPCD algorithm is proposed to solve the issue due to the difficult online generation of penetration maneuver command for loitering munition group in the dynamic environment containing interceptors and air defenses. The state space and reward function are improved by domain knowledge and rule knowledge to enhance the generalization ability and training convergence speed of the algorithm. A LMGPCD decision framework based on the soft actor-critic (SAC) algorithm is constructed to increase the exploration efficiency of the algorithm. An expert experience applying and imitation learning method is utilized against the lacking of initial efficient training experience for the algorithm due to the narrow solution space caused by increasing number of missiles and threats. The experimental results show that the proposed algorithm can generate more effective penetration maneuver command in real time in a dynamic environment compared to other algorithm, which verifies the effectiveness of the proposed algorithm.

AB - The loitering munition group penetration control decision (LMGPCD) is the key to improve the autonomy and intelligence of loitering munition group combat. A knowledge-assisted reinforcement learning-based LMGPCD algorithm is proposed to solve the issue due to the difficult online generation of penetration maneuver command for loitering munition group in the dynamic environment containing interceptors and air defenses. The state space and reward function are improved by domain knowledge and rule knowledge to enhance the generalization ability and training convergence speed of the algorithm. A LMGPCD decision framework based on the soft actor-critic (SAC) algorithm is constructed to increase the exploration efficiency of the algorithm. An expert experience applying and imitation learning method is utilized against the lacking of initial efficient training experience for the algorithm due to the narrow solution space caused by increasing number of missiles and threats. The experimental results show that the proposed algorithm can generate more effective penetration maneuver command in real time in a dynamic environment compared to other algorithm, which verifies the effectiveness of the proposed algorithm.

KW - control decision

KW - dynamic environment penetration

KW - knowledge-assisted deep reinforcement learning

KW - loitering munition group

KW - soft actor-critic algorithm

UR - http://www.scopus.com/inward/record.url?scp=85203848745&partnerID=8YFLogxK

U2 - 10.12382/bgxb.2023.0827

DO - 10.12382/bgxb.2023.0827

M3 - 文章

AN - SCOPUS:85203848745

SN - 1000-1093

VL - 45

SP - 3161

EP - 3176

JO - Binggong Xuebao/Acta Armamentarii

JF - Binggong Xuebao/Acta Armamentarii

IS - 9

ER -

基于知识辅助深度强化学习的巡飞弹组动态突防决策

摘要

关键词

访问文件

其它文件与链接

指纹

引用此