Abstract
Reinforcement learning (RL) offers the possibility of an end-to-end strategy of visual servoing (VS) from captured images or features. However, there will be unsmooth actions when RL-agent solely depends on the current state. In this article, a hierarchical proximal policy optimization method is proposed for learning the VS strategy based on RL. A subgoal generation function based on the sequence of historical data is designed and defined as a high-level strategy to provide a smooth subgoal for low-level policy training. The low-level policy takes the current state and subgoal with smoothing attributes as inputs for considering historical data. Furthermore, a novel measurement approach is introduced through the mean cluster to encourage agent exploration during the learning process. The autonomous visual landing experiments are conducted for a quadrotor to validate the effectiveness of the proposed algorithm. The novelty analysis and VS performance analysis in different scenarios are shown in the comparative experiments.
| Original language | English |
|---|---|
| Pages (from-to) | 11009-11018 |
| Number of pages | 10 |
| Journal | IEEE Transactions on Industrial Electronics |
| Volume | 71 |
| Issue number | 9 |
| DOIs | |
| State | Published - 1 Sep 2024 |
Keywords
- Hierarchical reinforcement learning (HRL)
- proximal policy optimization (PPO)
- sequential data
- transition function
- visual servoing (VS)