TY - JOUR
T1 - Domain Knowledge-Assisted Deep Reinforcement Learning Power Allocation for MIMO Radar Detection
AU - Wang, Yuedong
AU - Liang, Yan
AU - Zhang, Huixia
AU - Gu, Yijing
N1 - Publisher Copyright:
© 2001-2012 IEEE.
PY - 2022/12/1
Y1 - 2022/12/1
N2 - The power allocation of multiple-input multiple-output (MIMO) radars is a key point in target tracking and detection. The optimality of a multitarget and multiconstraint optimization problem strictly depends on a priori model, which is difficult to obtain in time-varying, complex, and noncooperative environments. Recently, deep reinforcement learning (DRL) has been applied for target tracking tasks, which provides a trial-and-error interactive learning mechanism to improve the policy. Unlike tracking tasks with complete target state transition models, it remains an open issue for DRL-based MIMO radar detection that requires efficiently adapting the control policy to the environment of randomly appearing targets and extensive power transmission actions, which leads to sparse final task rewards and hence slow policy learning for the agent. Through introducing both the analytic model (radar equation) and empirical rules (expert preferences) for domain knowledge, this article proposes a domain-knowledge-assisted DRL (DKADRL) framework in which a domain-knowledge-based timely reward generator is utilized to generate timely rewards that assist the agent's policy learning. To adjust the role of the timely rewards and the final task rewards, a reward fusion module is designed, which gradually increases the role of the final task rewards as the training process progresses, thus allows agent's policy to converge to the final optimization goal. The algorithm is validated under two target motion scenarios, showing the higher target detection probability and the faster training speed, compared to equal power allocation and proximal policy optimization (PPO)-based power allocation.
AB - The power allocation of multiple-input multiple-output (MIMO) radars is a key point in target tracking and detection. The optimality of a multitarget and multiconstraint optimization problem strictly depends on a priori model, which is difficult to obtain in time-varying, complex, and noncooperative environments. Recently, deep reinforcement learning (DRL) has been applied for target tracking tasks, which provides a trial-and-error interactive learning mechanism to improve the policy. Unlike tracking tasks with complete target state transition models, it remains an open issue for DRL-based MIMO radar detection that requires efficiently adapting the control policy to the environment of randomly appearing targets and extensive power transmission actions, which leads to sparse final task rewards and hence slow policy learning for the agent. Through introducing both the analytic model (radar equation) and empirical rules (expert preferences) for domain knowledge, this article proposes a domain-knowledge-assisted DRL (DKADRL) framework in which a domain-knowledge-based timely reward generator is utilized to generate timely rewards that assist the agent's policy learning. To adjust the role of the timely rewards and the final task rewards, a reward fusion module is designed, which gradually increases the role of the final task rewards as the training process progresses, thus allows agent's policy to converge to the final optimization goal. The algorithm is validated under two target motion scenarios, showing the higher target detection probability and the faster training speed, compared to equal power allocation and proximal policy optimization (PPO)-based power allocation.
KW - Domain knowledge assistance
KW - multiple radar system
KW - reinforcement learning (RL)
KW - resource allocation
UR - http://www.scopus.com/inward/record.url?scp=85139844090&partnerID=8YFLogxK
U2 - 10.1109/JSEN.2022.3211606
DO - 10.1109/JSEN.2022.3211606
M3 - 文章
AN - SCOPUS:85139844090
SN - 1530-437X
VL - 22
SP - 23117
EP - 23128
JO - IEEE Sensors Journal
JF - IEEE Sensors Journal
IS - 23
ER -