A Deep Reinforcement Learning-Based Whittle Index Policy for Multibeam Allocation

Yuhang Hao; Zengfu Wang; Jing Fu; Quan Pan

doi:10.23919/FUSION59988.2024.10706358

A Deep Reinforcement Learning-Based Whittle Index Policy for Multibeam Allocation

Yuhang Hao, Zengfu Wang, Jing Fu, Quan Pan

School of Automation

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

In this paper, a non-myopic beam scheduling policy is proposed for multi-target tracking (MTT) in a phased-array radar network, seeking to minimize the discounted sum of tracking error of targets and improve the long-term tracking performance. The Whittle index policy based on the restless multiarmed bandit (RMAB) model can decompose the state space of the underlying optimization problem into independent spaces with reduced sizes. We consider the tracking error covariance (TEC) matrix as the state of each target (arm), which evolves based on the Kalman filter. However, for a real-world MTT, the exact calculation of the Whittle index in multiple dimensions is challenging. The neural network is established to achieve the feature extraction of TEC states and learn the corresponding Whittle index. The deep reinforcement learning (DRL) method is exploited to train the neural network by leveraging the threshold property of the Whittle index policy and engaging in interactions with a single target tracking environment. We propose the DRL-based Whittle index policy, namely DRLWI, aiming to solve the beam allocation problem for MTT with multi-dimensional TEC states. This approach effectively mitigates the exponential computational complexity of classical dynamic programming approaches and the low convergence rate caused by large joint state and action spaces in the simple application of DRL algorithms. Numerical results demonstrate the performance of the proposed DRLWI policy surpasses that of DRL algorithms and myopic policies.

Original language	English
Title of host publication	FUSION 2024 - 27th International Conference on Information Fusion
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781737749769
DOIs	https://doi.org/10.23919/FUSION59988.2024.10706358
State	Published - 2024
Event	27th International Conference on Information Fusion, FUSION 2024 - Venice, Italy Duration: 7 Jul 2024 → 11 Jul 2024

Publication series

Name	FUSION 2024 - 27th International Conference on Information Fusion

Conference

Conference	27th International Conference on Information Fusion, FUSION 2024
Country/Territory	Italy
City	Venice
Period	7/07/24 → 11/07/24

Keywords

deep reinforcement learning
multibeam allocation
target tracking
Whittle index

Access to Document

10.23919/FUSION59988.2024.10706358

Cite this

Hao, Y., Wang, Z., Fu, J., & Pan, Q. (2024). A Deep Reinforcement Learning-Based Whittle Index Policy for Multibeam Allocation. In FUSION 2024 - 27th International Conference on Information Fusion (FUSION 2024 - 27th International Conference on Information Fusion). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.23919/FUSION59988.2024.10706358

@inproceedings{43fdb0e3ebaa46888d7a2374621762da,

title = "A Deep Reinforcement Learning-Based Whittle Index Policy for Multibeam Allocation",

abstract = "In this paper, a non-myopic beam scheduling policy is proposed for multi-target tracking (MTT) in a phased-array radar network, seeking to minimize the discounted sum of tracking error of targets and improve the long-term tracking performance. The Whittle index policy based on the restless multiarmed bandit (RMAB) model can decompose the state space of the underlying optimization problem into independent spaces with reduced sizes. We consider the tracking error covariance (TEC) matrix as the state of each target (arm), which evolves based on the Kalman filter. However, for a real-world MTT, the exact calculation of the Whittle index in multiple dimensions is challenging. The neural network is established to achieve the feature extraction of TEC states and learn the corresponding Whittle index. The deep reinforcement learning (DRL) method is exploited to train the neural network by leveraging the threshold property of the Whittle index policy and engaging in interactions with a single target tracking environment. We propose the DRL-based Whittle index policy, namely DRLWI, aiming to solve the beam allocation problem for MTT with multi-dimensional TEC states. This approach effectively mitigates the exponential computational complexity of classical dynamic programming approaches and the low convergence rate caused by large joint state and action spaces in the simple application of DRL algorithms. Numerical results demonstrate the performance of the proposed DRLWI policy surpasses that of DRL algorithms and myopic policies.",

keywords = "deep reinforcement learning, multibeam allocation, target tracking, Whittle index",

author = "Yuhang Hao and Zengfu Wang and Jing Fu and Quan Pan",

note = "Publisher Copyright: {\textcopyright} 2024 ISIF.; 27th International Conference on Information Fusion, FUSION 2024 ; Conference date: 07-07-2024 Through 11-07-2024",

year = "2024",

doi = "10.23919/FUSION59988.2024.10706358",

language = "英语",

series = "FUSION 2024 - 27th International Conference on Information Fusion",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "FUSION 2024 - 27th International Conference on Information Fusion",

}

Hao, Y, Wang, Z, Fu, J & Pan, Q 2024, A Deep Reinforcement Learning-Based Whittle Index Policy for Multibeam Allocation. in FUSION 2024 - 27th International Conference on Information Fusion. FUSION 2024 - 27th International Conference on Information Fusion, Institute of Electrical and Electronics Engineers Inc., 27th International Conference on Information Fusion, FUSION 2024, Venice, Italy, 7/07/24. https://doi.org/10.23919/FUSION59988.2024.10706358

A Deep Reinforcement Learning-Based Whittle Index Policy for Multibeam Allocation. / Hao, Yuhang; Wang, Zengfu; Fu, Jing et al.
FUSION 2024 - 27th International Conference on Information Fusion. Institute of Electrical and Electronics Engineers Inc., 2024. (FUSION 2024 - 27th International Conference on Information Fusion).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - A Deep Reinforcement Learning-Based Whittle Index Policy for Multibeam Allocation

AU - Hao, Yuhang

AU - Wang, Zengfu

AU - Fu, Jing

AU - Pan, Quan

PY - 2024

Y1 - 2024

N2 - In this paper, a non-myopic beam scheduling policy is proposed for multi-target tracking (MTT) in a phased-array radar network, seeking to minimize the discounted sum of tracking error of targets and improve the long-term tracking performance. The Whittle index policy based on the restless multiarmed bandit (RMAB) model can decompose the state space of the underlying optimization problem into independent spaces with reduced sizes. We consider the tracking error covariance (TEC) matrix as the state of each target (arm), which evolves based on the Kalman filter. However, for a real-world MTT, the exact calculation of the Whittle index in multiple dimensions is challenging. The neural network is established to achieve the feature extraction of TEC states and learn the corresponding Whittle index. The deep reinforcement learning (DRL) method is exploited to train the neural network by leveraging the threshold property of the Whittle index policy and engaging in interactions with a single target tracking environment. We propose the DRL-based Whittle index policy, namely DRLWI, aiming to solve the beam allocation problem for MTT with multi-dimensional TEC states. This approach effectively mitigates the exponential computational complexity of classical dynamic programming approaches and the low convergence rate caused by large joint state and action spaces in the simple application of DRL algorithms. Numerical results demonstrate the performance of the proposed DRLWI policy surpasses that of DRL algorithms and myopic policies.

AB - In this paper, a non-myopic beam scheduling policy is proposed for multi-target tracking (MTT) in a phased-array radar network, seeking to minimize the discounted sum of tracking error of targets and improve the long-term tracking performance. The Whittle index policy based on the restless multiarmed bandit (RMAB) model can decompose the state space of the underlying optimization problem into independent spaces with reduced sizes. We consider the tracking error covariance (TEC) matrix as the state of each target (arm), which evolves based on the Kalman filter. However, for a real-world MTT, the exact calculation of the Whittle index in multiple dimensions is challenging. The neural network is established to achieve the feature extraction of TEC states and learn the corresponding Whittle index. The deep reinforcement learning (DRL) method is exploited to train the neural network by leveraging the threshold property of the Whittle index policy and engaging in interactions with a single target tracking environment. We propose the DRL-based Whittle index policy, namely DRLWI, aiming to solve the beam allocation problem for MTT with multi-dimensional TEC states. This approach effectively mitigates the exponential computational complexity of classical dynamic programming approaches and the low convergence rate caused by large joint state and action spaces in the simple application of DRL algorithms. Numerical results demonstrate the performance of the proposed DRLWI policy surpasses that of DRL algorithms and myopic policies.

KW - deep reinforcement learning

KW - multibeam allocation

KW - target tracking

KW - Whittle index

UR - http://www.scopus.com/inward/record.url?scp=85207693888&partnerID=8YFLogxK

U2 - 10.23919/FUSION59988.2024.10706358

DO - 10.23919/FUSION59988.2024.10706358

M3 - 会议稿件

AN - SCOPUS:85207693888

T3 - FUSION 2024 - 27th International Conference on Information Fusion

BT - FUSION 2024 - 27th International Conference on Information Fusion

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 27th International Conference on Information Fusion, FUSION 2024

Y2 - 7 July 2024 through 11 July 2024

ER -

A Deep Reinforcement Learning-Based Whittle Index Policy for Multibeam Allocation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this