A Deep Reinforcement Learning-Based Whittle Index Policy for Multibeam Allocation

Yuhang Hao, Zengfu Wang, Jing Fu, Quan Pan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, a non-myopic beam scheduling policy is proposed for multi-target tracking (MTT) in a phased-array radar network, seeking to minimize the discounted sum of tracking error of targets and improve the long-term tracking performance. The Whittle index policy based on the restless multiarmed bandit (RMAB) model can decompose the state space of the underlying optimization problem into independent spaces with reduced sizes. We consider the tracking error covariance (TEC) matrix as the state of each target (arm), which evolves based on the Kalman filter. However, for a real-world MTT, the exact calculation of the Whittle index in multiple dimensions is challenging. The neural network is established to achieve the feature extraction of TEC states and learn the corresponding Whittle index. The deep reinforcement learning (DRL) method is exploited to train the neural network by leveraging the threshold property of the Whittle index policy and engaging in interactions with a single target tracking environment. We propose the DRL-based Whittle index policy, namely DRLWI, aiming to solve the beam allocation problem for MTT with multi-dimensional TEC states. This approach effectively mitigates the exponential computational complexity of classical dynamic programming approaches and the low convergence rate caused by large joint state and action spaces in the simple application of DRL algorithms. Numerical results demonstrate the performance of the proposed DRLWI policy surpasses that of DRL algorithms and myopic policies.

Original languageEnglish
Title of host publicationFUSION 2024 - 27th International Conference on Information Fusion
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781737749769
DOIs
StatePublished - 2024
Event27th International Conference on Information Fusion, FUSION 2024 - Venice, Italy
Duration: 7 Jul 202411 Jul 2024

Publication series

NameFUSION 2024 - 27th International Conference on Information Fusion

Conference

Conference27th International Conference on Information Fusion, FUSION 2024
Country/TerritoryItaly
CityVenice
Period7/07/2411/07/24

Keywords

  • deep reinforcement learning
  • multibeam allocation
  • target tracking
  • Whittle index

Fingerprint

Dive into the research topics of 'A Deep Reinforcement Learning-Based Whittle Index Policy for Multibeam Allocation'. Together they form a unique fingerprint.

Cite this