Directly Attention loss adjusted prioritized experience replay

Zhuoying Chen; Huiping Li; Zhaoxu Wang

doi:10.1007/s40747-025-01852-6

Directly Attention loss adjusted prioritized experience replay

Zhuoying Chen, Huiping Li, Zhaoxu Wang

School of Marine Science and Technology

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

Abstract

Prioritized Experience Replay enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. However, this non-uniform sampling method shifts the state-action distribution that is originally used to estimate Q-value functions, which brings about the estimation deviation. In this article, a novel off-policy reinforcement learning training framework called Directly Attention Loss Adjusted Prioritized Experience Replay (DALAP) is proposed, which can directly quantify the changed extent of the shifted distribution through Parallel Self-Attention network, enabling precise error compensation. Furthermore, a Priority-Encouragement mechanism is designed to optimize the sample screening criteria, and enhance training efficiency. To verify the effectiveness of DALAP, a realistic environment of multi-USV, based on Unreal Engine, is constructed. Comparative experiments across multiple groups demonstrate that DALAP offers significant advantages, including faster convergence and smaller training variance.

Original language	English
Article number	267
Journal	Complex and Intelligent Systems
Volume	11
Issue number	6
DOIs	https://doi.org/10.1007/s40747-025-01852-6
State	Published - Jun 2025

Keywords

Multi-USV
Parallel self-attention network
Prioritized experience replay
Priority-encouragement mechanism

Access to Document

10.1007/s40747-025-01852-6

Cite this

@article{29a1f4f66b334d6e946f31e7737818c9,

title = "Directly Attention loss adjusted prioritized experience replay",

abstract = "Prioritized Experience Replay enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. However, this non-uniform sampling method shifts the state-action distribution that is originally used to estimate Q-value functions, which brings about the estimation deviation. In this article, a novel off-policy reinforcement learning training framework called Directly Attention Loss Adjusted Prioritized Experience Replay (DALAP) is proposed, which can directly quantify the changed extent of the shifted distribution through Parallel Self-Attention network, enabling precise error compensation. Furthermore, a Priority-Encouragement mechanism is designed to optimize the sample screening criteria, and enhance training efficiency. To verify the effectiveness of DALAP, a realistic environment of multi-USV, based on Unreal Engine, is constructed. Comparative experiments across multiple groups demonstrate that DALAP offers significant advantages, including faster convergence and smaller training variance.",

keywords = "Multi-USV, Parallel self-attention network, Prioritized experience replay, Priority-encouragement mechanism",

author = "Zhuoying Chen and Huiping Li and Zhaoxu Wang",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2025.",

year = "2025",

month = jun,

doi = "10.1007/s40747-025-01852-6",

language = "英语",

volume = "11",

journal = "Complex and Intelligent Systems",

issn = "2199-4536",

publisher = "Springer International Publishing AG",

number = "6",

}

TY - JOUR

T1 - Directly Attention loss adjusted prioritized experience replay

AU - Chen, Zhuoying

AU - Li, Huiping

AU - Wang, Zhaoxu

N1 - Publisher Copyright: © The Author(s) 2025.

PY - 2025/6

Y1 - 2025/6

N2 - Prioritized Experience Replay enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. However, this non-uniform sampling method shifts the state-action distribution that is originally used to estimate Q-value functions, which brings about the estimation deviation. In this article, a novel off-policy reinforcement learning training framework called Directly Attention Loss Adjusted Prioritized Experience Replay (DALAP) is proposed, which can directly quantify the changed extent of the shifted distribution through Parallel Self-Attention network, enabling precise error compensation. Furthermore, a Priority-Encouragement mechanism is designed to optimize the sample screening criteria, and enhance training efficiency. To verify the effectiveness of DALAP, a realistic environment of multi-USV, based on Unreal Engine, is constructed. Comparative experiments across multiple groups demonstrate that DALAP offers significant advantages, including faster convergence and smaller training variance.

AB - Prioritized Experience Replay enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. However, this non-uniform sampling method shifts the state-action distribution that is originally used to estimate Q-value functions, which brings about the estimation deviation. In this article, a novel off-policy reinforcement learning training framework called Directly Attention Loss Adjusted Prioritized Experience Replay (DALAP) is proposed, which can directly quantify the changed extent of the shifted distribution through Parallel Self-Attention network, enabling precise error compensation. Furthermore, a Priority-Encouragement mechanism is designed to optimize the sample screening criteria, and enhance training efficiency. To verify the effectiveness of DALAP, a realistic environment of multi-USV, based on Unreal Engine, is constructed. Comparative experiments across multiple groups demonstrate that DALAP offers significant advantages, including faster convergence and smaller training variance.

KW - Multi-USV

KW - Parallel self-attention network

KW - Prioritized experience replay

KW - Priority-encouragement mechanism

UR - http://www.scopus.com/inward/record.url?scp=105003824410&partnerID=8YFLogxK

U2 - 10.1007/s40747-025-01852-6

DO - 10.1007/s40747-025-01852-6

M3 - 文章

AN - SCOPUS:105003824410

SN - 2199-4536

VL - 11

JO - Complex and Intelligent Systems

JF - Complex and Intelligent Systems

IS - 6

M1 - 267

ER -

Directly Attention loss adjusted prioritized experience replay

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this