TY - JOUR
T1 - Directly Attention loss adjusted prioritized experience replay
AU - Chen, Zhuoying
AU - Li, Huiping
AU - Wang, Zhaoxu
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/6
Y1 - 2025/6
N2 - Prioritized Experience Replay enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. However, this non-uniform sampling method shifts the state-action distribution that is originally used to estimate Q-value functions, which brings about the estimation deviation. In this article, a novel off-policy reinforcement learning training framework called Directly Attention Loss Adjusted Prioritized Experience Replay (DALAP) is proposed, which can directly quantify the changed extent of the shifted distribution through Parallel Self-Attention network, enabling precise error compensation. Furthermore, a Priority-Encouragement mechanism is designed to optimize the sample screening criteria, and enhance training efficiency. To verify the effectiveness of DALAP, a realistic environment of multi-USV, based on Unreal Engine, is constructed. Comparative experiments across multiple groups demonstrate that DALAP offers significant advantages, including faster convergence and smaller training variance.
AB - Prioritized Experience Replay enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. However, this non-uniform sampling method shifts the state-action distribution that is originally used to estimate Q-value functions, which brings about the estimation deviation. In this article, a novel off-policy reinforcement learning training framework called Directly Attention Loss Adjusted Prioritized Experience Replay (DALAP) is proposed, which can directly quantify the changed extent of the shifted distribution through Parallel Self-Attention network, enabling precise error compensation. Furthermore, a Priority-Encouragement mechanism is designed to optimize the sample screening criteria, and enhance training efficiency. To verify the effectiveness of DALAP, a realistic environment of multi-USV, based on Unreal Engine, is constructed. Comparative experiments across multiple groups demonstrate that DALAP offers significant advantages, including faster convergence and smaller training variance.
KW - Multi-USV
KW - Parallel self-attention network
KW - Prioritized experience replay
KW - Priority-encouragement mechanism
UR - http://www.scopus.com/inward/record.url?scp=105003824410&partnerID=8YFLogxK
U2 - 10.1007/s40747-025-01852-6
DO - 10.1007/s40747-025-01852-6
M3 - 文章
AN - SCOPUS:105003824410
SN - 2199-4536
VL - 11
JO - Complex and Intelligent Systems
JF - Complex and Intelligent Systems
IS - 6
M1 - 267
ER -