Abstract
Deep Reinforcement Learning (DRL), leveraging the nonlinear approximation capabilities of deep neural networks, maps high-dimensional perception information to robotic control commands and has been widely applied to various continuous control tasks. However, for long-horizon tasks, manipulators often face challenges such as high-dimensional exploration spaces and sparse rewards, making it difficult to learn effective strategies and even leading to potentially dangerous actions. Additionally, collecting a large volume of high-quality expert demonstrations for underwater manipulators is challenging. To overcome these limitations, this study proposes a DRL algorithm that integrates a small amount of expert experience for long-horizon control of underwater manipulators. Firstly, a DRL dynamic optimization strategy based on expert experience is designed, featuring a dual-buffer dynamic sampling mechanism that enables efficient early-stage learning. Secondly, a linear attention mechanism network is developed to aggregate global task features while maintaining low computational complexity, allowing the model to effectively process high-dimensional sensory inputs and share features across multiple tasks. Furthermore, a staged reward function is designed to steadily learn skills for each phase, ultimately completing the entire long-horizon task. To validate the effectiveness of the proposed algorithm, an underwater simulation environment is constructed in Gazebo. In this simulation environment, the manipulator is trained on various long-horizon tasks, including grasp-place, grasp-stack, and grasp-insert. Training results demonstrate that the proposed algorithm outperforms existing methods in terms of task success rate and policy stability. Additionally, real-world experiments in a water tank environment verify the algorithm's generalization and robustness in practical underwater operations. Note to Practitioners - Underwater robotic manipulation is increasingly vital in complex applications such as offshore maintenance, deep-sea archaeology, and marine resource extraction. However, underwater manipulators often suffer from limited data, sparse rewards, and difficulty in executing multi-stage tasks with precision. This paper proposes a practical solution for enabling underwater manipulators to autonomously complete long-horizon tasks (e.g., grasping, stacking, inserting) through a Deep Reinforcement Learning (DRL) framework enhanced by expert demonstrations and a novel linear attention mechanism. The integration of a dynamic dual-buffer sampling strategy ensures sample efficiency and learning robustness, while the attention-based subtask sharing improves generalization across tasks. Practitioners can apply this method directly in Gazebo-based simulations or real-world underwater operations using conventional 6-DoF manipulators and depth cameras. The trained model demonstrates strong transferability without the need for task-specific fine-tuning, simplifying deployment in unpredictable marine environments. This work bridges the gap between high-level decision-making and low-level control, offering an adaptable and scalable learning pipeline for real-time underwater robotic applications.
| Original language | English |
|---|---|
| Pages (from-to) | 9695-9708 |
| Number of pages | 14 |
| Journal | IEEE Transactions on Automation Science and Engineering |
| Volume | 23 |
| DOIs | |
| State | Published - 2026 |
Keywords
- Underwater manipulator
- attention mechanism
- deep reinforcement learning
- long-horizon manipulation skills
Fingerprint
Dive into the research topics of 'Linear Attention-Driven DRL with Sparse Expert Fusion: A Dynamic Optimization Algorithm of Underwater Manipulator for Long Horizon Tasks'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver