Skip to main navigation Skip to search Skip to main content

An Index Policy Based on Sarsa and Q -Learning for Heterogeneous Smart Target Tracking

  • Yuhang Hao
  • , Zengfu Wang
  • , Jing Fu
  • , Quan Pan
  • , Tao Yun
  • Northwestern Polytechnical University Xian
  • Royal Melbourne Institute of Technology University

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

In solving the nonmyopic radar scheduling for multiple smart target tracking within an active and passive radar network (APRN), both short-term enhanced tracking performance and a higher probability of target maneuvering in the future with active tracking must be considered. Acquiring the long-term tracking performance exhibits the curse of dimensionality, where optimal solutions are in general intractable. Meanwhile, the unknown dynamic mode transition of smart targets complicates the beam scheduling problem. This article models this problem as a Markov decision process (MDP) consisting of parallel restless bandit processes. Each bandit process is associated with a smart target, of which mode states are defined by the dynamic modes. The mode state evolves according to different dynamic model transitions under different actions - whether or not the target is being actively tracked. For unknown state transition matrices, this article proposes a new method that utilizes the forward state-action-reward-state-action (Sarsa) and backward Q-learning (QL) to approximate the indices through adapting the state-action value functions, or equivalently the Q-functions. The efficient scheduling policy follows the indices that are real numbers representing the marginal rewards of taking different actions. A new policy, namely, index policy based on the Sarsa and Q-learning (ISQ), is proposed to maximize the long-term tracking rewards. Numerical results demonstrate that the proposed ISQ policy outperforms conventional QL-based methods and the deep Q-network (DQN) algorithm. It also rapidly converges to the well-known Whittle index policy with revealed state transition models, which is considered the benchmark.

Original languageEnglish
Pages (from-to)36127-36142
Number of pages16
JournalIEEE Sensors Journal
Volume24
Issue number21
DOIs
StatePublished - 2024

Keywords

  • Index policy
  • Q-learning (QL)
  • radar scheduling
  • state-action-reward-state-action (Sarsa)
  • target tracking

Fingerprint

Dive into the research topics of 'An Index Policy Based on Sarsa and Q -Learning for Heterogeneous Smart Target Tracking'. Together they form a unique fingerprint.

Cite this