Efficient Training Framework for Multi-USV System Based on Off-policy Deep Reinforcement Learning

Research output: Contribution to journalArticlepeer-review

Abstract

Prioritized Experience Replay (PER) is a technical means of off-policy deep reinforcement learning by selecting important experience samples more frequently to improve the training efficiency. However, the non-uniform sampling applied in PER inevitably shifts the state-action distribution and brings the estimation errors of Q-value function. In this paper, an efficient off-policy reinforcement learning training framework called Attention Loss Adjusted Prioritized (ALAP) Experience Replay, is proposed. ALAP exploits the similarity of the transitions in buffer to quantify the training progress, and accurately corrects the bias based on the positive correlation between the error compensation strength and the training progress. In order to verify the effectiveness of the algorithm, the ALAP is tested on 15 different games on Atari 2600 benchmark. Additionally, we developed a multi-USV competition scenario using Unreal Engine to further illustrate the superiority as well as the practical value of ALAP.

Original languageEnglish
Pages (from-to)1159-1177
Number of pages19
JournalInformation Technology and Control
Volume54
Issue number4
DOIs
StatePublished - 19 Dec 2025

Keywords

  • Attention
  • Multi-USV
  • Off-Policy Deep Reinforcement Learning
  • Prioritized Experience Replay

Fingerprint

Dive into the research topics of 'Efficient Training Framework for Multi-USV System Based on Off-policy Deep Reinforcement Learning'. Together they form a unique fingerprint.

Cite this