TY - JOUR
T1 - A Neural Network-Based Whittle Index Policy for Beam Resource Allocation in Multitarget Tracking
AU - Hao, Yuhang
AU - Wang, Zengfu
AU - Fu, Jing
AU - Pan, Quan
N1 - Publisher Copyright:
© 2001-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - In a colocated multiple-input multiple-output (MIMO) radar system for multitarget tracking (MTT), the non-myopic beam allocation schemes based on conventional programming approaches result in large-scale state space and action space. This article formulates the beam allocation problem through a restless multi-armed bandit (RMAB) model and leverages the computationally efficient Whittle index policy. The optimization objective is defined as the infinite-horizon discounted reward, which is evaluated based on the Bayesian Cramér-Rao lower bounds (BCRLBs) of the targets. In this approach, each target is treated as an arm, and the joint multi-dimensional state of each target comprises the BCRLB and the dynamic state. However, it is intractable to exactly compute the Whittle index of each target with the convoluted transition process of the joint state. This article combines the Whittle index policy and deep reinforcement learning (DRL), seeking to approximate the Whittle index by leveraging its threshold property. Since the BCRLB metric update depends on the Jacobian matrix of the nonlinear measurement equation that is related to dynamic states, the two-channel neural network is constructed to approximate the Whittle index on both BCRLB states and dynamic states for each target. In this architecture, the inputs of networks are preprocessed joint state features. Subsequently, DRL techniques are employed to train the neural network. Above all, the neural network-based Whittle index (NNWI) policy is proposed to achieve non-myopic tracking performance for multiple targets. Numerical results demonstrate that the optimization performance of the proposed NNWI policy outperforms that of myopic policies and other DRL algorithms.
AB - In a colocated multiple-input multiple-output (MIMO) radar system for multitarget tracking (MTT), the non-myopic beam allocation schemes based on conventional programming approaches result in large-scale state space and action space. This article formulates the beam allocation problem through a restless multi-armed bandit (RMAB) model and leverages the computationally efficient Whittle index policy. The optimization objective is defined as the infinite-horizon discounted reward, which is evaluated based on the Bayesian Cramér-Rao lower bounds (BCRLBs) of the targets. In this approach, each target is treated as an arm, and the joint multi-dimensional state of each target comprises the BCRLB and the dynamic state. However, it is intractable to exactly compute the Whittle index of each target with the convoluted transition process of the joint state. This article combines the Whittle index policy and deep reinforcement learning (DRL), seeking to approximate the Whittle index by leveraging its threshold property. Since the BCRLB metric update depends on the Jacobian matrix of the nonlinear measurement equation that is related to dynamic states, the two-channel neural network is constructed to approximate the Whittle index on both BCRLB states and dynamic states for each target. In this architecture, the inputs of networks are preprocessed joint state features. Subsequently, DRL techniques are employed to train the neural network. Above all, the neural network-based Whittle index (NNWI) policy is proposed to achieve non-myopic tracking performance for multiple targets. Numerical results demonstrate that the optimization performance of the proposed NNWI policy outperforms that of myopic policies and other DRL algorithms.
KW - Beam allocation
KW - multitarget tracking (MTT)
KW - neural network
KW - Whittle index
UR - http://www.scopus.com/inward/record.url?scp=85200823972&partnerID=8YFLogxK
U2 - 10.1109/JSEN.2024.3435020
DO - 10.1109/JSEN.2024.3435020
M3 - 文章
AN - SCOPUS:85200823972
SN - 1530-437X
VL - 24
SP - 29400
EP - 29413
JO - IEEE Sensors Journal
JF - IEEE Sensors Journal
IS - 18
ER -