A Neural Network-Based Whittle Index Policy for Beam Resource Allocation in Multitarget Tracking

Yuhang Hao; Zengfu Wang; Jing Fu; Quan Pan

doi:10.1109/JSEN.2024.3435020

A Neural Network-Based Whittle Index Policy for Beam Resource Allocation in Multitarget Tracking

Yuhang Hao, Zengfu Wang, Jing Fu, Quan Pan

School of Automation

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

In a colocated multiple-input multiple-output (MIMO) radar system for multitarget tracking (MTT), the non-myopic beam allocation schemes based on conventional programming approaches result in large-scale state space and action space. This article formulates the beam allocation problem through a restless multi-armed bandit (RMAB) model and leverages the computationally efficient Whittle index policy. The optimization objective is defined as the infinite-horizon discounted reward, which is evaluated based on the Bayesian Cramér-Rao lower bounds (BCRLBs) of the targets. In this approach, each target is treated as an arm, and the joint multi-dimensional state of each target comprises the BCRLB and the dynamic state. However, it is intractable to exactly compute the Whittle index of each target with the convoluted transition process of the joint state. This article combines the Whittle index policy and deep reinforcement learning (DRL), seeking to approximate the Whittle index by leveraging its threshold property. Since the BCRLB metric update depends on the Jacobian matrix of the nonlinear measurement equation that is related to dynamic states, the two-channel neural network is constructed to approximate the Whittle index on both BCRLB states and dynamic states for each target. In this architecture, the inputs of networks are preprocessed joint state features. Subsequently, DRL techniques are employed to train the neural network. Above all, the neural network-based Whittle index (NNWI) policy is proposed to achieve non-myopic tracking performance for multiple targets. Numerical results demonstrate that the optimization performance of the proposed NNWI policy outperforms that of myopic policies and other DRL algorithms.

Original language	English
Pages (from-to)	29400-29413
Number of pages	14
Journal	IEEE Sensors Journal
Volume	24
Issue number	18
DOIs	https://doi.org/10.1109/JSEN.2024.3435020
State	Published - 2024

Keywords

Beam allocation
multitarget tracking (MTT)
neural network
Whittle index

Access to Document

10.1109/JSEN.2024.3435020

Cite this

@article{1eb888c1db4f4258bce33be50dc4465d,

title = "A Neural Network-Based Whittle Index Policy for Beam Resource Allocation in Multitarget Tracking",

abstract = "In a colocated multiple-input multiple-output (MIMO) radar system for multitarget tracking (MTT), the non-myopic beam allocation schemes based on conventional programming approaches result in large-scale state space and action space. This article formulates the beam allocation problem through a restless multi-armed bandit (RMAB) model and leverages the computationally efficient Whittle index policy. The optimization objective is defined as the infinite-horizon discounted reward, which is evaluated based on the Bayesian Cram{\'e}r-Rao lower bounds (BCRLBs) of the targets. In this approach, each target is treated as an arm, and the joint multi-dimensional state of each target comprises the BCRLB and the dynamic state. However, it is intractable to exactly compute the Whittle index of each target with the convoluted transition process of the joint state. This article combines the Whittle index policy and deep reinforcement learning (DRL), seeking to approximate the Whittle index by leveraging its threshold property. Since the BCRLB metric update depends on the Jacobian matrix of the nonlinear measurement equation that is related to dynamic states, the two-channel neural network is constructed to approximate the Whittle index on both BCRLB states and dynamic states for each target. In this architecture, the inputs of networks are preprocessed joint state features. Subsequently, DRL techniques are employed to train the neural network. Above all, the neural network-based Whittle index (NNWI) policy is proposed to achieve non-myopic tracking performance for multiple targets. Numerical results demonstrate that the optimization performance of the proposed NNWI policy outperforms that of myopic policies and other DRL algorithms.",

keywords = "Beam allocation, multitarget tracking (MTT), neural network, Whittle index",

author = "Yuhang Hao and Zengfu Wang and Jing Fu and Quan Pan",

note = "Publisher Copyright: {\textcopyright} 2001-2012 IEEE.",

year = "2024",

doi = "10.1109/JSEN.2024.3435020",

language = "英语",

volume = "24",

pages = "29400--29413",

journal = "IEEE Sensors Journal",

issn = "1530-437X",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "18",

}

TY - JOUR

T1 - A Neural Network-Based Whittle Index Policy for Beam Resource Allocation in Multitarget Tracking

AU - Hao, Yuhang

AU - Wang, Zengfu

AU - Fu, Jing

AU - Pan, Quan

PY - 2024

Y1 - 2024

N2 - In a colocated multiple-input multiple-output (MIMO) radar system for multitarget tracking (MTT), the non-myopic beam allocation schemes based on conventional programming approaches result in large-scale state space and action space. This article formulates the beam allocation problem through a restless multi-armed bandit (RMAB) model and leverages the computationally efficient Whittle index policy. The optimization objective is defined as the infinite-horizon discounted reward, which is evaluated based on the Bayesian Cramér-Rao lower bounds (BCRLBs) of the targets. In this approach, each target is treated as an arm, and the joint multi-dimensional state of each target comprises the BCRLB and the dynamic state. However, it is intractable to exactly compute the Whittle index of each target with the convoluted transition process of the joint state. This article combines the Whittle index policy and deep reinforcement learning (DRL), seeking to approximate the Whittle index by leveraging its threshold property. Since the BCRLB metric update depends on the Jacobian matrix of the nonlinear measurement equation that is related to dynamic states, the two-channel neural network is constructed to approximate the Whittle index on both BCRLB states and dynamic states for each target. In this architecture, the inputs of networks are preprocessed joint state features. Subsequently, DRL techniques are employed to train the neural network. Above all, the neural network-based Whittle index (NNWI) policy is proposed to achieve non-myopic tracking performance for multiple targets. Numerical results demonstrate that the optimization performance of the proposed NNWI policy outperforms that of myopic policies and other DRL algorithms.

AB - In a colocated multiple-input multiple-output (MIMO) radar system for multitarget tracking (MTT), the non-myopic beam allocation schemes based on conventional programming approaches result in large-scale state space and action space. This article formulates the beam allocation problem through a restless multi-armed bandit (RMAB) model and leverages the computationally efficient Whittle index policy. The optimization objective is defined as the infinite-horizon discounted reward, which is evaluated based on the Bayesian Cramér-Rao lower bounds (BCRLBs) of the targets. In this approach, each target is treated as an arm, and the joint multi-dimensional state of each target comprises the BCRLB and the dynamic state. However, it is intractable to exactly compute the Whittle index of each target with the convoluted transition process of the joint state. This article combines the Whittle index policy and deep reinforcement learning (DRL), seeking to approximate the Whittle index by leveraging its threshold property. Since the BCRLB metric update depends on the Jacobian matrix of the nonlinear measurement equation that is related to dynamic states, the two-channel neural network is constructed to approximate the Whittle index on both BCRLB states and dynamic states for each target. In this architecture, the inputs of networks are preprocessed joint state features. Subsequently, DRL techniques are employed to train the neural network. Above all, the neural network-based Whittle index (NNWI) policy is proposed to achieve non-myopic tracking performance for multiple targets. Numerical results demonstrate that the optimization performance of the proposed NNWI policy outperforms that of myopic policies and other DRL algorithms.

KW - Beam allocation

KW - multitarget tracking (MTT)

KW - neural network

KW - Whittle index

UR - http://www.scopus.com/inward/record.url?scp=85200823972&partnerID=8YFLogxK

U2 - 10.1109/JSEN.2024.3435020

DO - 10.1109/JSEN.2024.3435020

M3 - 文章

AN - SCOPUS:85200823972

SN - 1530-437X

VL - 24

SP - 29400

EP - 29413

JO - IEEE Sensors Journal

JF - IEEE Sensors Journal

IS - 18

ER -

A Neural Network-Based Whittle Index Policy for Beam Resource Allocation in Multitarget Tracking

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this