TY - GEN
T1 - A Deep Reinforcement Learning-Based Whittle Index Policy for Multibeam Allocation
AU - Hao, Yuhang
AU - Wang, Zengfu
AU - Fu, Jing
AU - Pan, Quan
N1 - Publisher Copyright:
© 2024 ISIF.
PY - 2024
Y1 - 2024
N2 - In this paper, a non-myopic beam scheduling policy is proposed for multi-target tracking (MTT) in a phased-array radar network, seeking to minimize the discounted sum of tracking error of targets and improve the long-term tracking performance. The Whittle index policy based on the restless multiarmed bandit (RMAB) model can decompose the state space of the underlying optimization problem into independent spaces with reduced sizes. We consider the tracking error covariance (TEC) matrix as the state of each target (arm), which evolves based on the Kalman filter. However, for a real-world MTT, the exact calculation of the Whittle index in multiple dimensions is challenging. The neural network is established to achieve the feature extraction of TEC states and learn the corresponding Whittle index. The deep reinforcement learning (DRL) method is exploited to train the neural network by leveraging the threshold property of the Whittle index policy and engaging in interactions with a single target tracking environment. We propose the DRL-based Whittle index policy, namely DRLWI, aiming to solve the beam allocation problem for MTT with multi-dimensional TEC states. This approach effectively mitigates the exponential computational complexity of classical dynamic programming approaches and the low convergence rate caused by large joint state and action spaces in the simple application of DRL algorithms. Numerical results demonstrate the performance of the proposed DRLWI policy surpasses that of DRL algorithms and myopic policies.
AB - In this paper, a non-myopic beam scheduling policy is proposed for multi-target tracking (MTT) in a phased-array radar network, seeking to minimize the discounted sum of tracking error of targets and improve the long-term tracking performance. The Whittle index policy based on the restless multiarmed bandit (RMAB) model can decompose the state space of the underlying optimization problem into independent spaces with reduced sizes. We consider the tracking error covariance (TEC) matrix as the state of each target (arm), which evolves based on the Kalman filter. However, for a real-world MTT, the exact calculation of the Whittle index in multiple dimensions is challenging. The neural network is established to achieve the feature extraction of TEC states and learn the corresponding Whittle index. The deep reinforcement learning (DRL) method is exploited to train the neural network by leveraging the threshold property of the Whittle index policy and engaging in interactions with a single target tracking environment. We propose the DRL-based Whittle index policy, namely DRLWI, aiming to solve the beam allocation problem for MTT with multi-dimensional TEC states. This approach effectively mitigates the exponential computational complexity of classical dynamic programming approaches and the low convergence rate caused by large joint state and action spaces in the simple application of DRL algorithms. Numerical results demonstrate the performance of the proposed DRLWI policy surpasses that of DRL algorithms and myopic policies.
KW - deep reinforcement learning
KW - multibeam allocation
KW - target tracking
KW - Whittle index
UR - http://www.scopus.com/inward/record.url?scp=85207693888&partnerID=8YFLogxK
U2 - 10.23919/FUSION59988.2024.10706358
DO - 10.23919/FUSION59988.2024.10706358
M3 - 会议稿件
AN - SCOPUS:85207693888
T3 - FUSION 2024 - 27th International Conference on Information Fusion
BT - FUSION 2024 - 27th International Conference on Information Fusion
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 27th International Conference on Information Fusion, FUSION 2024
Y2 - 7 July 2024 through 11 July 2024
ER -