Abstract
Prioritized Experience Replay enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. However, this non-uniform sampling method shifts the state-action distribution that is originally used to estimate Q-value functions, which brings about the estimation deviation. In this article, a novel off-policy reinforcement learning training framework called Directly Attention Loss Adjusted Prioritized Experience Replay (DALAP) is proposed, which can directly quantify the changed extent of the shifted distribution through Parallel Self-Attention network, enabling precise error compensation. Furthermore, a Priority-Encouragement mechanism is designed to optimize the sample screening criteria, and enhance training efficiency. To verify the effectiveness of DALAP, a realistic environment of multi-USV, based on Unreal Engine, is constructed. Comparative experiments across multiple groups demonstrate that DALAP offers significant advantages, including faster convergence and smaller training variance.
| Original language | English |
|---|---|
| Article number | 267 |
| Journal | Complex and Intelligent Systems |
| Volume | 11 |
| Issue number | 6 |
| DOIs | |
| State | Published - Jun 2025 |
Keywords
- Multi-USV
- Parallel self-attention network
- Prioritized experience replay
- Priority-encouragement mechanism
Fingerprint
Dive into the research topics of 'Directly Attention loss adjusted prioritized experience replay'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver