TY - JOUR
T1 - Visual State Space Model Enhanced Features for UAV Geo-localization
AU - Liu, Qi
AU - Pei, Zhixiang
AU - Zhou, Yu
AU - Hui, Le
AU - Dai, Yuchao
AU - He, Mingyi
N1 - Publisher Copyright:
©2025 IEEE.
PY - 2025
Y1 - 2025
N2 - The reliable operation of UAVs in complex environments relies on a strong positioning system, as conventional GNSS is susceptible to failure when signals are blocked or disrupted. As a result, Visual Place Recognition (VPR) has become a key technology for UAV localization. By matching the visual information captured by UAVs with the prebuilt satellite map database, UAVs can achieve geographical positioning. Traditional methods that rely on pre-trained networks for extracting global features for matching and retrieval are typically sensitive to visual appearance variations and prone to losing fine-grained information. To address this issue, we propose a UAV visual geolocalization method based on a dual-branch network, combining a pre-trained vision transformer model and a visual state space model to extract more robust features. Specifically, we design a dual-branch feature extraction network that integrates the DINOv2 and Mamba models to overcome challenges posed in appearance changes. It leverages the complementary strengths of both models to improving visual localization performance by combining global and local features. Additionally, we introduce an efficient, robust feature fusion framework inspired by the MLP-Mixer architecture to enhance the performance of multi-source feature representations. Experimental results on the ALTO and NewYorkFly datasets demonstrate that the proposed method outperforms existing methods in metrics such as R@1 and R@5. Notably, on the NewYorkFly dataset, R@1 improves by 6.3%. These results highlight the significant advantages of our method in UAV visual geo-localization tasks.
AB - The reliable operation of UAVs in complex environments relies on a strong positioning system, as conventional GNSS is susceptible to failure when signals are blocked or disrupted. As a result, Visual Place Recognition (VPR) has become a key technology for UAV localization. By matching the visual information captured by UAVs with the prebuilt satellite map database, UAVs can achieve geographical positioning. Traditional methods that rely on pre-trained networks for extracting global features for matching and retrieval are typically sensitive to visual appearance variations and prone to losing fine-grained information. To address this issue, we propose a UAV visual geolocalization method based on a dual-branch network, combining a pre-trained vision transformer model and a visual state space model to extract more robust features. Specifically, we design a dual-branch feature extraction network that integrates the DINOv2 and Mamba models to overcome challenges posed in appearance changes. It leverages the complementary strengths of both models to improving visual localization performance by combining global and local features. Additionally, we introduce an efficient, robust feature fusion framework inspired by the MLP-Mixer architecture to enhance the performance of multi-source feature representations. Experimental results on the ALTO and NewYorkFly datasets demonstrate that the proposed method outperforms existing methods in metrics such as R@1 and R@5. Notably, on the NewYorkFly dataset, R@1 improves by 6.3%. These results highlight the significant advantages of our method in UAV visual geo-localization tasks.
KW - dual-branch feature extraction
KW - state space model
KW - UAV geo-localization
KW - visual localization
UR - https://www.scopus.com/pages/publications/105034023680
U2 - 10.1109/IGARSS55030.2025.11242940
DO - 10.1109/IGARSS55030.2025.11242940
M3 - 会议文章
AN - SCOPUS:105034023680
SN - 2153-6996
SP - 5628
EP - 5632
JO - International Geoscience and Remote Sensing Symposium (IGARSS)
JF - International Geoscience and Remote Sensing Symposium (IGARSS)
T2 - 2025 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2025
Y2 - 3 August 2025 through 8 August 2025
ER -