TY - JOUR
T1 - Attention-Guided Reinforcement Learning for Visual Servoing Control of Multirotor UAVs
AU - Ma, Bodi
AU - Yang, Sili
AU - Tang, Yong
AU - Zhi, Guozhu
AU - Zhong, Kelin
AU - Liu, Zhenbao
N1 - Publisher Copyright:
© 1965-2011 IEEE.
PY - 2026
Y1 - 2026
N2 - To address the challenges of dynamic perception, real-time decision making, and control stability in uncrewed aerial vehicle (UAV) visual tracking tasks, this study proposes an attention-guided visual servoing reinforcement learning (AVSRL) framework. Unlike conventional image-based visual servoing (IBVS) methods that rely on analytical Jacobian control, AVSRL employs a virtual camera mechanism to normalize image observations and extract geometry-aware visual features as state inputs to a deep reinforcement learning (DRL) agent. The proposed framework integrates model identification, attention-enhanced actor-critic learning, and multisource visual-environment encoding to enable robust and adaptive UAV control in dynamic and complex environments. A multihead attention module selectively processes spatial-temporal observations from both static and dynamic objects, while a centralized critic facilitates cooperative optimization across agents. Extensive simulations and real-world experiments were conducted, including average reward analysis, hyperparameter sensitivity evaluation, trajectory tracking of complex geometric paths, and pedestrian following in outdoor scenarios. Results show that AVSRL outperforms baseline DRL and IBVS controllers in terms of tracking accuracy, control smoothness, and adaptability to visual and environmental variability. The findings validate the AVSRL framework as a promising solution for real-time UAV navigation and tracking tasks under uncertain and complex visual conditions.
AB - To address the challenges of dynamic perception, real-time decision making, and control stability in uncrewed aerial vehicle (UAV) visual tracking tasks, this study proposes an attention-guided visual servoing reinforcement learning (AVSRL) framework. Unlike conventional image-based visual servoing (IBVS) methods that rely on analytical Jacobian control, AVSRL employs a virtual camera mechanism to normalize image observations and extract geometry-aware visual features as state inputs to a deep reinforcement learning (DRL) agent. The proposed framework integrates model identification, attention-enhanced actor-critic learning, and multisource visual-environment encoding to enable robust and adaptive UAV control in dynamic and complex environments. A multihead attention module selectively processes spatial-temporal observations from both static and dynamic objects, while a centralized critic facilitates cooperative optimization across agents. Extensive simulations and real-world experiments were conducted, including average reward analysis, hyperparameter sensitivity evaluation, trajectory tracking of complex geometric paths, and pedestrian following in outdoor scenarios. Results show that AVSRL outperforms baseline DRL and IBVS controllers in terms of tracking accuracy, control smoothness, and adaptability to visual and environmental variability. The findings validate the AVSRL framework as a promising solution for real-time UAV navigation and tracking tasks under uncertain and complex visual conditions.
KW - Dynamic environment
KW - reinforcement learning (RL)
KW - tracking control
KW - uncrewed aerial vehicles (UAVs)
KW - wind disturbances
UR - https://www.scopus.com/pages/publications/105028220618
U2 - 10.1109/TAES.2025.3650551
DO - 10.1109/TAES.2025.3650551
M3 - 文章
AN - SCOPUS:105028220618
SN - 0018-9251
VL - 62
SP - 4156
EP - 4167
JO - IEEE Transactions on Aerospace and Electronic Systems
JF - IEEE Transactions on Aerospace and Electronic Systems
ER -