TY - JOUR
T1 - Linear Attention-Driven DRL with Sparse Expert Fusion
T2 - A Dynamic Optimization Algorithm of Underwater Manipulator for Long Horizon Tasks
AU - Li, Yufeng
AU - Gao, Jian
AU - Chen, Yimin
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - Deep Reinforcement Learning (DRL), leveraging the nonlinear approximation capabilities of deep neural networks, maps high-dimensional perception information to robotic control commands and has been widely applied to various continuous control tasks. However, for long-horizon tasks, manipulators often face challenges such as high-dimensional exploration spaces and sparse rewards, making it difficult to learn effective strategies and even leading to potentially dangerous actions. Additionally, collecting a large volume of high-quality expert demonstrations for underwater manipulators is challenging. To overcome these limitations, this study proposes a DRL algorithm that integrates a small amount of expert experience for long-horizon control of underwater manipulators. Firstly, a DRL dynamic optimization strategy based on expert experience is designed, featuring a dual-buffer dynamic sampling mechanism that enables efficient early-stage learning. Secondly, a linear attention mechanism network is developed to aggregate global task features while maintaining low computational complexity, allowing the model to effectively process high-dimensional sensory inputs and share features across multiple tasks. Furthermore, a staged reward function is designed to steadily learn skills for each phase, ultimately completing the entire long-horizon task. To validate the effectiveness of the proposed algorithm, an underwater simulation environment is constructed in Gazebo. In this simulation environment, the manipulator is trained on various long-horizon tasks, including grasp-place, grasp-stack, and grasp-insert. Training results demonstrate that the proposed algorithm outperforms existing methods in terms of task success rate and policy stability. Additionally, real-world experiments in a water tank environment verify the algorithm's generalization and robustness in practical underwater operations. Note to Practitioners - Underwater robotic manipulation is increasingly vital in complex applications such as offshore maintenance, deep-sea archaeology, and marine resource extraction. However, underwater manipulators often suffer from limited data, sparse rewards, and difficulty in executing multi-stage tasks with precision. This paper proposes a practical solution for enabling underwater manipulators to autonomously complete long-horizon tasks (e.g., grasping, stacking, inserting) through a Deep Reinforcement Learning (DRL) framework enhanced by expert demonstrations and a novel linear attention mechanism. The integration of a dynamic dual-buffer sampling strategy ensures sample efficiency and learning robustness, while the attention-based subtask sharing improves generalization across tasks. Practitioners can apply this method directly in Gazebo-based simulations or real-world underwater operations using conventional 6-DoF manipulators and depth cameras. The trained model demonstrates strong transferability without the need for task-specific fine-tuning, simplifying deployment in unpredictable marine environments. This work bridges the gap between high-level decision-making and low-level control, offering an adaptable and scalable learning pipeline for real-time underwater robotic applications.
AB - Deep Reinforcement Learning (DRL), leveraging the nonlinear approximation capabilities of deep neural networks, maps high-dimensional perception information to robotic control commands and has been widely applied to various continuous control tasks. However, for long-horizon tasks, manipulators often face challenges such as high-dimensional exploration spaces and sparse rewards, making it difficult to learn effective strategies and even leading to potentially dangerous actions. Additionally, collecting a large volume of high-quality expert demonstrations for underwater manipulators is challenging. To overcome these limitations, this study proposes a DRL algorithm that integrates a small amount of expert experience for long-horizon control of underwater manipulators. Firstly, a DRL dynamic optimization strategy based on expert experience is designed, featuring a dual-buffer dynamic sampling mechanism that enables efficient early-stage learning. Secondly, a linear attention mechanism network is developed to aggregate global task features while maintaining low computational complexity, allowing the model to effectively process high-dimensional sensory inputs and share features across multiple tasks. Furthermore, a staged reward function is designed to steadily learn skills for each phase, ultimately completing the entire long-horizon task. To validate the effectiveness of the proposed algorithm, an underwater simulation environment is constructed in Gazebo. In this simulation environment, the manipulator is trained on various long-horizon tasks, including grasp-place, grasp-stack, and grasp-insert. Training results demonstrate that the proposed algorithm outperforms existing methods in terms of task success rate and policy stability. Additionally, real-world experiments in a water tank environment verify the algorithm's generalization and robustness in practical underwater operations. Note to Practitioners - Underwater robotic manipulation is increasingly vital in complex applications such as offshore maintenance, deep-sea archaeology, and marine resource extraction. However, underwater manipulators often suffer from limited data, sparse rewards, and difficulty in executing multi-stage tasks with precision. This paper proposes a practical solution for enabling underwater manipulators to autonomously complete long-horizon tasks (e.g., grasping, stacking, inserting) through a Deep Reinforcement Learning (DRL) framework enhanced by expert demonstrations and a novel linear attention mechanism. The integration of a dynamic dual-buffer sampling strategy ensures sample efficiency and learning robustness, while the attention-based subtask sharing improves generalization across tasks. Practitioners can apply this method directly in Gazebo-based simulations or real-world underwater operations using conventional 6-DoF manipulators and depth cameras. The trained model demonstrates strong transferability without the need for task-specific fine-tuning, simplifying deployment in unpredictable marine environments. This work bridges the gap between high-level decision-making and low-level control, offering an adaptable and scalable learning pipeline for real-time underwater robotic applications.
KW - Underwater manipulator
KW - attention mechanism
KW - deep reinforcement learning
KW - long-horizon manipulation skills
UR - https://www.scopus.com/pages/publications/105038787165
U2 - 10.1109/TASE.2026.3691390
DO - 10.1109/TASE.2026.3691390
M3 - 文章
AN - SCOPUS:105038787165
SN - 1545-5955
VL - 23
SP - 9695
EP - 9708
JO - IEEE Transactions on Automation Science and Engineering
JF - IEEE Transactions on Automation Science and Engineering
ER -