Domain Knowledge-Assisted Deep Reinforcement Learning Power Allocation for MIMO Radar Detection

Yuedong Wang; Yan Liang; Huixia Zhang; Yijing Gu

doi:10.1109/JSEN.2022.3211606

Domain Knowledge-Assisted Deep Reinforcement Learning Power Allocation for MIMO Radar Detection

Yuedong Wang, Yan Liang, Huixia Zhang, Yijing Gu

School of Automation

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

14 Scopus citations

Abstract

The power allocation of multiple-input multiple-output (MIMO) radars is a key point in target tracking and detection. The optimality of a multitarget and multiconstraint optimization problem strictly depends on a priori model, which is difficult to obtain in time-varying, complex, and noncooperative environments. Recently, deep reinforcement learning (DRL) has been applied for target tracking tasks, which provides a trial-and-error interactive learning mechanism to improve the policy. Unlike tracking tasks with complete target state transition models, it remains an open issue for DRL-based MIMO radar detection that requires efficiently adapting the control policy to the environment of randomly appearing targets and extensive power transmission actions, which leads to sparse final task rewards and hence slow policy learning for the agent. Through introducing both the analytic model (radar equation) and empirical rules (expert preferences) for domain knowledge, this article proposes a domain-knowledge-assisted DRL (DKADRL) framework in which a domain-knowledge-based timely reward generator is utilized to generate timely rewards that assist the agent's policy learning. To adjust the role of the timely rewards and the final task rewards, a reward fusion module is designed, which gradually increases the role of the final task rewards as the training process progresses, thus allows agent's policy to converge to the final optimization goal. The algorithm is validated under two target motion scenarios, showing the higher target detection probability and the faster training speed, compared to equal power allocation and proximal policy optimization (PPO)-based power allocation.

Original language	English
Pages (from-to)	23117-23128
Number of pages	12
Journal	IEEE Sensors Journal
Volume	22
Issue number	23
DOIs	https://doi.org/10.1109/JSEN.2022.3211606
State	Published - 1 Dec 2022

Keywords

Domain knowledge assistance
multiple radar system
reinforcement learning (RL)
resource allocation

Access to Document

10.1109/JSEN.2022.3211606

Cite this

@article{4afc871e37e64d47b0881625537e5ab8,

title = "Domain Knowledge-Assisted Deep Reinforcement Learning Power Allocation for MIMO Radar Detection",

abstract = "The power allocation of multiple-input multiple-output (MIMO) radars is a key point in target tracking and detection. The optimality of a multitarget and multiconstraint optimization problem strictly depends on a priori model, which is difficult to obtain in time-varying, complex, and noncooperative environments. Recently, deep reinforcement learning (DRL) has been applied for target tracking tasks, which provides a trial-and-error interactive learning mechanism to improve the policy. Unlike tracking tasks with complete target state transition models, it remains an open issue for DRL-based MIMO radar detection that requires efficiently adapting the control policy to the environment of randomly appearing targets and extensive power transmission actions, which leads to sparse final task rewards and hence slow policy learning for the agent. Through introducing both the analytic model (radar equation) and empirical rules (expert preferences) for domain knowledge, this article proposes a domain-knowledge-assisted DRL (DKADRL) framework in which a domain-knowledge-based timely reward generator is utilized to generate timely rewards that assist the agent's policy learning. To adjust the role of the timely rewards and the final task rewards, a reward fusion module is designed, which gradually increases the role of the final task rewards as the training process progresses, thus allows agent's policy to converge to the final optimization goal. The algorithm is validated under two target motion scenarios, showing the higher target detection probability and the faster training speed, compared to equal power allocation and proximal policy optimization (PPO)-based power allocation.",

keywords = "Domain knowledge assistance, multiple radar system, reinforcement learning (RL), resource allocation",

author = "Yuedong Wang and Yan Liang and Huixia Zhang and Yijing Gu",

note = "Publisher Copyright: {\textcopyright} 2001-2012 IEEE.",

year = "2022",

month = dec,

day = "1",

doi = "10.1109/JSEN.2022.3211606",

language = "英语",

volume = "22",

pages = "23117--23128",

journal = "IEEE Sensors Journal",

issn = "1530-437X",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "23",

}

TY - JOUR

T1 - Domain Knowledge-Assisted Deep Reinforcement Learning Power Allocation for MIMO Radar Detection

AU - Wang, Yuedong

AU - Liang, Yan

AU - Zhang, Huixia

AU - Gu, Yijing

PY - 2022/12/1

Y1 - 2022/12/1

N2 - The power allocation of multiple-input multiple-output (MIMO) radars is a key point in target tracking and detection. The optimality of a multitarget and multiconstraint optimization problem strictly depends on a priori model, which is difficult to obtain in time-varying, complex, and noncooperative environments. Recently, deep reinforcement learning (DRL) has been applied for target tracking tasks, which provides a trial-and-error interactive learning mechanism to improve the policy. Unlike tracking tasks with complete target state transition models, it remains an open issue for DRL-based MIMO radar detection that requires efficiently adapting the control policy to the environment of randomly appearing targets and extensive power transmission actions, which leads to sparse final task rewards and hence slow policy learning for the agent. Through introducing both the analytic model (radar equation) and empirical rules (expert preferences) for domain knowledge, this article proposes a domain-knowledge-assisted DRL (DKADRL) framework in which a domain-knowledge-based timely reward generator is utilized to generate timely rewards that assist the agent's policy learning. To adjust the role of the timely rewards and the final task rewards, a reward fusion module is designed, which gradually increases the role of the final task rewards as the training process progresses, thus allows agent's policy to converge to the final optimization goal. The algorithm is validated under two target motion scenarios, showing the higher target detection probability and the faster training speed, compared to equal power allocation and proximal policy optimization (PPO)-based power allocation.

AB - The power allocation of multiple-input multiple-output (MIMO) radars is a key point in target tracking and detection. The optimality of a multitarget and multiconstraint optimization problem strictly depends on a priori model, which is difficult to obtain in time-varying, complex, and noncooperative environments. Recently, deep reinforcement learning (DRL) has been applied for target tracking tasks, which provides a trial-and-error interactive learning mechanism to improve the policy. Unlike tracking tasks with complete target state transition models, it remains an open issue for DRL-based MIMO radar detection that requires efficiently adapting the control policy to the environment of randomly appearing targets and extensive power transmission actions, which leads to sparse final task rewards and hence slow policy learning for the agent. Through introducing both the analytic model (radar equation) and empirical rules (expert preferences) for domain knowledge, this article proposes a domain-knowledge-assisted DRL (DKADRL) framework in which a domain-knowledge-based timely reward generator is utilized to generate timely rewards that assist the agent's policy learning. To adjust the role of the timely rewards and the final task rewards, a reward fusion module is designed, which gradually increases the role of the final task rewards as the training process progresses, thus allows agent's policy to converge to the final optimization goal. The algorithm is validated under two target motion scenarios, showing the higher target detection probability and the faster training speed, compared to equal power allocation and proximal policy optimization (PPO)-based power allocation.

KW - Domain knowledge assistance

KW - multiple radar system

KW - reinforcement learning (RL)

KW - resource allocation

UR - http://www.scopus.com/inward/record.url?scp=85139844090&partnerID=8YFLogxK

U2 - 10.1109/JSEN.2022.3211606

DO - 10.1109/JSEN.2022.3211606

M3 - 文章

AN - SCOPUS:85139844090

SN - 1530-437X

VL - 22

SP - 23117

EP - 23128

JO - IEEE Sensors Journal

JF - IEEE Sensors Journal

IS - 23

ER -

Domain Knowledge-Assisted Deep Reinforcement Learning Power Allocation for MIMO Radar Detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this