伴随压制干扰与组网雷达功率分配的深度博弈研究

Yuedong Wang; Yijing Gu; Yan Liang; Zengfu Wang; Huixia Zhang

doi:10.12000/JR23023

伴随压制干扰与组网雷达功率分配的深度博弈研究

Translated title of the contribution: Deep Game of Escorting Suppressive Jamming and Networked Radar Power Allocation

Yuedong Wang, Yijing Gu, Yan Liang, Zengfu Wang, Huixia Zhang

School of Automation

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

The traditional networked radar power allocation is typically optimized with a given jamming model, while the jammer resource allocation is optimized with a given radar power allocation method; such research lack gaming and interaction. Given the rising seriousness of combat scenarios in which radars and jammers compete, this study suggests a deep game problem of networked radar power allocation under escort suppression jamming, in which intelligent target jamming is trained using Deep Reinforcement Learning (DRL). First, the jammer and the networked radar are mapped as two agents in this problem. Based on the jamming model and the radar detection model, the target detection model of the networked radar under suppressed jamming and the optimized objective function for maximizing the target detection probability are established. In terms of the networked radar agent, the radar power allocation vector is generated by the Proximal Policy Optimization (PPO) policy network. In terms of the jammer agent, a hybrid policy network is designed to simultaneously create beam selection and power allocation actions. Domain knowledge is introduced to construct more effective reward functions. Three kinds of domain knowledge, namely target detection model, equal power allocation strategy, and greedy interference power allocation strategy, are employed to produce guided rewards for the networked radar agent and the jammer agent, respectively. Consequently, the learning efficiency and performance of the agent are improved. Lastly, alternating training is used to learn the policy network parameters of both agents. The experimental results show that when the jammer adopts the DRL-based resource allocation strategy, the DRL-based networked radar power allocation is significantly better than the particle swarm-based and the artificial fish swarm-based networked radar power allocation in both target detection probability and run time metrics.

Translated title of the contribution	Deep Game of Escorting Suppressive Jamming and Networked Radar Power Allocation
Original language	Chinese (Traditional)
Pages (from-to)	642-656
Number of pages	15
Journal	Journal of Radars
Volume	12
Issue number	3
DOIs	https://doi.org/10.12000/JR23023
State	Published - Jun 2023

Access to Document

10.12000/JR23023

Cite this

@article{96965aa09df04b52ae3d0b4a91440ef6,

title = "伴随压制干扰与组网雷达功率分配的深度博弈研究",

abstract = "The traditional networked radar power allocation is typically optimized with a given jamming model, while the jammer resource allocation is optimized with a given radar power allocation method; such research lack gaming and interaction. Given the rising seriousness of combat scenarios in which radars and jammers compete, this study suggests a deep game problem of networked radar power allocation under escort suppression jamming, in which intelligent target jamming is trained using Deep Reinforcement Learning (DRL). First, the jammer and the networked radar are mapped as two agents in this problem. Based on the jamming model and the radar detection model, the target detection model of the networked radar under suppressed jamming and the optimized objective function for maximizing the target detection probability are established. In terms of the networked radar agent, the radar power allocation vector is generated by the Proximal Policy Optimization (PPO) policy network. In terms of the jammer agent, a hybrid policy network is designed to simultaneously create beam selection and power allocation actions. Domain knowledge is introduced to construct more effective reward functions. Three kinds of domain knowledge, namely target detection model, equal power allocation strategy, and greedy interference power allocation strategy, are employed to produce guided rewards for the networked radar agent and the jammer agent, respectively. Consequently, the learning efficiency and performance of the agent are improved. Lastly, alternating training is used to learn the policy network parameters of both agents. The experimental results show that when the jammer adopts the DRL-based resource allocation strategy, the DRL-based networked radar power allocation is significantly better than the particle swarm-based and the artificial fish swarm-based networked radar power allocation in both target detection probability and run time metrics.",

keywords = "Deep game, Deep Reinforcement Learning (DRL), Detection probability, Domain knowledge assisted learning, Escort suppression jamming, Radar resource management",

author = "Yuedong Wang and Yijing Gu and Yan Liang and Zengfu Wang and Huixia Zhang",

year = "2023",

month = jun,

doi = "10.12000/JR23023",

language = "繁体中文",

volume = "12",

pages = "642--656",

journal = "Journal of Radars",

issn = "2095-283X",

publisher = "Institute of Electronics Chinese Academy of Sciences",

number = "3",

}

TY - JOUR

T1 - 伴随压制干扰与组网雷达功率分配的深度博弈研究

AU - Wang, Yuedong

AU - Gu, Yijing

AU - Liang, Yan

AU - Wang, Zengfu

AU - Zhang, Huixia

PY - 2023/6

Y1 - 2023/6

N2 - The traditional networked radar power allocation is typically optimized with a given jamming model, while the jammer resource allocation is optimized with a given radar power allocation method; such research lack gaming and interaction. Given the rising seriousness of combat scenarios in which radars and jammers compete, this study suggests a deep game problem of networked radar power allocation under escort suppression jamming, in which intelligent target jamming is trained using Deep Reinforcement Learning (DRL). First, the jammer and the networked radar are mapped as two agents in this problem. Based on the jamming model and the radar detection model, the target detection model of the networked radar under suppressed jamming and the optimized objective function for maximizing the target detection probability are established. In terms of the networked radar agent, the radar power allocation vector is generated by the Proximal Policy Optimization (PPO) policy network. In terms of the jammer agent, a hybrid policy network is designed to simultaneously create beam selection and power allocation actions. Domain knowledge is introduced to construct more effective reward functions. Three kinds of domain knowledge, namely target detection model, equal power allocation strategy, and greedy interference power allocation strategy, are employed to produce guided rewards for the networked radar agent and the jammer agent, respectively. Consequently, the learning efficiency and performance of the agent are improved. Lastly, alternating training is used to learn the policy network parameters of both agents. The experimental results show that when the jammer adopts the DRL-based resource allocation strategy, the DRL-based networked radar power allocation is significantly better than the particle swarm-based and the artificial fish swarm-based networked radar power allocation in both target detection probability and run time metrics.

AB - The traditional networked radar power allocation is typically optimized with a given jamming model, while the jammer resource allocation is optimized with a given radar power allocation method; such research lack gaming and interaction. Given the rising seriousness of combat scenarios in which radars and jammers compete, this study suggests a deep game problem of networked radar power allocation under escort suppression jamming, in which intelligent target jamming is trained using Deep Reinforcement Learning (DRL). First, the jammer and the networked radar are mapped as two agents in this problem. Based on the jamming model and the radar detection model, the target detection model of the networked radar under suppressed jamming and the optimized objective function for maximizing the target detection probability are established. In terms of the networked radar agent, the radar power allocation vector is generated by the Proximal Policy Optimization (PPO) policy network. In terms of the jammer agent, a hybrid policy network is designed to simultaneously create beam selection and power allocation actions. Domain knowledge is introduced to construct more effective reward functions. Three kinds of domain knowledge, namely target detection model, equal power allocation strategy, and greedy interference power allocation strategy, are employed to produce guided rewards for the networked radar agent and the jammer agent, respectively. Consequently, the learning efficiency and performance of the agent are improved. Lastly, alternating training is used to learn the policy network parameters of both agents. The experimental results show that when the jammer adopts the DRL-based resource allocation strategy, the DRL-based networked radar power allocation is significantly better than the particle swarm-based and the artificial fish swarm-based networked radar power allocation in both target detection probability and run time metrics.

KW - Deep game

KW - Deep Reinforcement Learning (DRL)

KW - Detection probability

KW - Domain knowledge assisted learning

KW - Escort suppression jamming

KW - Radar resource management

UR - http://www.scopus.com/inward/record.url?scp=85172995724&partnerID=8YFLogxK

U2 - 10.12000/JR23023

DO - 10.12000/JR23023

M3 - 文章

AN - SCOPUS:85172995724

SN - 2095-283X

VL - 12

SP - 642

EP - 656

JO - Journal of Radars

JF - Journal of Radars

IS - 3

ER -

伴随压制干扰与组网雷达功率分配的深度博弈研究

Abstract

Access to Document

Other files and links

Fingerprint

Cite this