TY - JOUR
T1 - UAVSeg
T2 - Dual-Encoder Cross-Scale Attention Network for UAV Images Semantic Segmentation
AU - Wang, Zhen
AU - You, Zhuhong
AU - Xu, Nan
AU - Zhang, Chuanlei
AU - Huang, De Shuang
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Benefiting from the powerful feature extraction and feature correlation modeling capabilities of convolutional neural networks (CNNs) and Transformer models, these techniques have been widely used in unmanned aerial vehicle (UAV) aerial image semantic segmentation tasks. However, the ground objects in aerial images contain feature information with different scales, and existing methods directly cascade low-level visual features and high-level semantic features without processing, resulting in low semantic segmentation precision. To address these challenges, we propose a Dual-Encoder Cross-Scale Attention Network, which efficiently extracts local and global context information from aerial images and performs fine-grained fusion of multi-scale features to improve semantic segmentation performance. Firstly, we introduce the Dual-CNNs-Transformer Encoder, which embeds the Scan-Focus Window Transformer (SFWT) into CNNs as an auxiliary encoder to supplement the local feature information lost in the global context information extraction process. Secondly, the Cross-Scale Lightweight Integration (CSLI) module is designed, which uses Light Dot-Product Attention Mechanism (DPAM) to fusion multi-scale features and reduce model calculation parameters. Lastly, the Linear Multi-Layer Perceptron (LMLP) is used to restore the feature map resolution while expanding the deconvolution receptive field. To validate the effectiveness of the proposed method, we conducted extensive experiments on real aerial scene datasets, including UAVid, Urban Drone, and Aeroscapes. The experimental results show that our method achieves state-of-the-art performance while maintaining superior real-time efficiency. Implementation codes will be available on https://github.com/darkseid-arch/UAVSeg.
AB - Benefiting from the powerful feature extraction and feature correlation modeling capabilities of convolutional neural networks (CNNs) and Transformer models, these techniques have been widely used in unmanned aerial vehicle (UAV) aerial image semantic segmentation tasks. However, the ground objects in aerial images contain feature information with different scales, and existing methods directly cascade low-level visual features and high-level semantic features without processing, resulting in low semantic segmentation precision. To address these challenges, we propose a Dual-Encoder Cross-Scale Attention Network, which efficiently extracts local and global context information from aerial images and performs fine-grained fusion of multi-scale features to improve semantic segmentation performance. Firstly, we introduce the Dual-CNNs-Transformer Encoder, which embeds the Scan-Focus Window Transformer (SFWT) into CNNs as an auxiliary encoder to supplement the local feature information lost in the global context information extraction process. Secondly, the Cross-Scale Lightweight Integration (CSLI) module is designed, which uses Light Dot-Product Attention Mechanism (DPAM) to fusion multi-scale features and reduce model calculation parameters. Lastly, the Linear Multi-Layer Perceptron (LMLP) is used to restore the feature map resolution while expanding the deconvolution receptive field. To validate the effectiveness of the proposed method, we conducted extensive experiments on real aerial scene datasets, including UAVid, Urban Drone, and Aeroscapes. The experimental results show that our method achieves state-of-the-art performance while maintaining superior real-time efficiency. Implementation codes will be available on https://github.com/darkseid-arch/UAVSeg.
KW - attention mechanism
KW - feature extraction
KW - multi-layer perceptron
KW - semantic segmentation
KW - UAV aerial images
UR - http://www.scopus.com/inward/record.url?scp=85210081701&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2024.3502401
DO - 10.1109/TGRS.2024.3502401
M3 - 文章
AN - SCOPUS:85210081701
SN - 0196-2892
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 3502401
ER -