Abstract
In solving the nonmyopic radar scheduling for multiple smart target tracking within an active and passive radar network (APRN), both short-term enhanced tracking performance and a higher probability of target maneuvering in the future with active tracking must be considered. Acquiring the long-term tracking performance exhibits the curse of dimensionality, where optimal solutions are in general intractable. Meanwhile, the unknown dynamic mode transition of smart targets complicates the beam scheduling problem. This article models this problem as a Markov decision process (MDP) consisting of parallel restless bandit processes. Each bandit process is associated with a smart target, of which mode states are defined by the dynamic modes. The mode state evolves according to different dynamic model transitions under different actions - whether or not the target is being actively tracked. For unknown state transition matrices, this article proposes a new method that utilizes the forward state-action-reward-state-action (Sarsa) and backward Q-learning (QL) to approximate the indices through adapting the state-action value functions, or equivalently the Q-functions. The efficient scheduling policy follows the indices that are real numbers representing the marginal rewards of taking different actions. A new policy, namely, index policy based on the Sarsa and Q-learning (ISQ), is proposed to maximize the long-term tracking rewards. Numerical results demonstrate that the proposed ISQ policy outperforms conventional QL-based methods and the deep Q-network (DQN) algorithm. It also rapidly converges to the well-known Whittle index policy with revealed state transition models, which is considered the benchmark.
| Original language | English |
|---|---|
| Pages (from-to) | 36127-36142 |
| Number of pages | 16 |
| Journal | IEEE Sensors Journal |
| Volume | 24 |
| Issue number | 21 |
| DOIs | |
| State | Published - 2024 |
Keywords
- Index policy
- Q-learning (QL)
- radar scheduling
- state-action-reward-state-action (Sarsa)
- target tracking
Fingerprint
Dive into the research topics of 'An Index Policy Based on Sarsa and Q -Learning for Heterogeneous Smart Target Tracking'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver