UAVSeg: Dual-Encoder Cross-Scale Attention Network for UAV Images Semantic Segmentation

Zhen Wang, Zhuhong You, Nan Xu, Chuanlei Zhang, De Shuang Huang

Research output: Contribution to journalArticlepeer-review

Abstract

Benefiting from the powerful feature extraction and feature correlation modeling capabilities of convolutional neural networks (CNNs) and Transformer models, these techniques have been widely used in unmanned aerial vehicle (UAV) aerial image semantic segmentation tasks. However, the ground objects in aerial images contain feature information with different scales, and existing methods directly cascade low-level visual features and high-level semantic features without processing, resulting in low semantic segmentation precision. To address these challenges, we propose a Dual-Encoder Cross-Scale Attention Network, which efficiently extracts local and global context information from aerial images and performs fine-grained fusion of multi-scale features to improve semantic segmentation performance. Firstly, we introduce the Dual-CNNs-Transformer Encoder, which embeds the Scan-Focus Window Transformer (SFWT) into CNNs as an auxiliary encoder to supplement the local feature information lost in the global context information extraction process. Secondly, the Cross-Scale Lightweight Integration (CSLI) module is designed, which uses Light Dot-Product Attention Mechanism (DPAM) to fusion multi-scale features and reduce model calculation parameters. Lastly, the Linear Multi-Layer Perceptron (LMLP) is used to restore the feature map resolution while expanding the deconvolution receptive field. To validate the effectiveness of the proposed method, we conducted extensive experiments on real aerial scene datasets, including UAVid, Urban Drone, and Aeroscapes. The experimental results show that our method achieves state-of-the-art performance while maintaining superior real-time efficiency. Implementation codes will be available on https://github.com/darkseid-arch/UAVSeg.

Original languageEnglish
Article number3502401
JournalIEEE Transactions on Geoscience and Remote Sensing
DOIs
StateAccepted/In press - 2024

Keywords

  • attention mechanism
  • feature extraction
  • multi-layer perceptron
  • semantic segmentation
  • UAV aerial images

Fingerprint

Dive into the research topics of 'UAVSeg: Dual-Encoder Cross-Scale Attention Network for UAV Images Semantic Segmentation'. Together they form a unique fingerprint.

Cite this