Directly Attention loss adjusted prioritized experience replay

Zhuoying Chen, Huiping Li, Zhaoxu Wang

Research output: Contribution to journalArticlepeer-review

Abstract

Prioritized Experience Replay enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. However, this non-uniform sampling method shifts the state-action distribution that is originally used to estimate Q-value functions, which brings about the estimation deviation. In this article, a novel off-policy reinforcement learning training framework called Directly Attention Loss Adjusted Prioritized Experience Replay (DALAP) is proposed, which can directly quantify the changed extent of the shifted distribution through Parallel Self-Attention network, enabling precise error compensation. Furthermore, a Priority-Encouragement mechanism is designed to optimize the sample screening criteria, and enhance training efficiency. To verify the effectiveness of DALAP, a realistic environment of multi-USV, based on Unreal Engine, is constructed. Comparative experiments across multiple groups demonstrate that DALAP offers significant advantages, including faster convergence and smaller training variance.

Original languageEnglish
Article number267
JournalComplex and Intelligent Systems
Volume11
Issue number6
DOIs
StatePublished - Jun 2025

Keywords

  • Multi-USV
  • Parallel self-attention network
  • Prioritized experience replay
  • Priority-encouragement mechanism

Fingerprint

Dive into the research topics of 'Directly Attention loss adjusted prioritized experience replay'. Together they form a unique fingerprint.

Cite this