An Index Policy Based on Sarsa and Q -Learning for Heterogeneous Smart Target Tracking

Yuhang Hao, Zengfu Wang, Jing Fu, Quan Pan, Tao Yun

科研成果: 期刊稿件文章同行评审

摘要

In solving the nonmyopic radar scheduling for multiple smart target tracking within an active and passive radar network (APRN), both short-term enhanced tracking performance and a higher probability of target maneuvering in the future with active tracking must be considered. Acquiring the long-term tracking performance exhibits the curse of dimensionality, where optimal solutions are in general intractable. Meanwhile, the unknown dynamic mode transition of smart targets complicates the beam scheduling problem. This article models this problem as a Markov decision process (MDP) consisting of parallel restless bandit processes. Each bandit process is associated with a smart target, of which mode states are defined by the dynamic modes. The mode state evolves according to different dynamic model transitions under different actions - whether or not the target is being actively tracked. For unknown state transition matrices, this article proposes a new method that utilizes the forward state-action-reward-state-action (Sarsa) and backward Q-learning (QL) to approximate the indices through adapting the state-action value functions, or equivalently the Q-functions. The efficient scheduling policy follows the indices that are real numbers representing the marginal rewards of taking different actions. A new policy, namely, index policy based on the Sarsa and Q-learning (ISQ), is proposed to maximize the long-term tracking rewards. Numerical results demonstrate that the proposed ISQ policy outperforms conventional QL-based methods and the deep Q-network (DQN) algorithm. It also rapidly converges to the well-known Whittle index policy with revealed state transition models, which is considered the benchmark.

源语言英语
页(从-至)36127-36142
页数16
期刊IEEE Sensors Journal
24
21
DOI
出版状态已出版 - 2024

指纹

探究 'An Index Policy Based on Sarsa and Q -Learning for Heterogeneous Smart Target Tracking' 的科研主题。它们共同构成独一无二的指纹。

引用此