TY - JOUR
T1 - Segment Anything Model Driven Cross-Hierarchical Fusion Network for Remote Sensing Images Semantic Segmentation
AU - He, Fulin
AU - Wang, Zhen
AU - Xu, Nan
AU - You, Zhuhong
AU - Huang, Deshuang
N1 - Publisher Copyright:
© 2008-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Semantic segmentation of remote sensing images (RSIs) involves dense, pixel-wise classification of high-resolution satellite images, serving as a foundational technique for applications including land cover mapping, urban analysis, and environmental monitoring. However, accurately delineating complex and diverse objects in RSIs is challenging due to the large domain gap from natural images and the heterogeneous characteristics of ground objects. In this article, we present a novel segment anything model driven cross-hierarchical fusion network (CHFNet) for remote sensing semantic segmentation. Specifically, a segment anything model-based encoder is utilized to extract comprehensive semantic features, which are further aligned with convolutional features through a cross-modal feature alignment module. To enhance semantic consistency and multiscale representation, a feature interaction fusion module is introduced for deep interaction and fusion of hierarchical features. Furthermore, a hypergraph-based decoder is designed to capture complex topological structures and inherent global relationships in RSIs. Extensive experiments on benchmark datasets (ISPRS Vaihingen and ISPRS Potsdam) demonstrate that CHFNet consistently outperforms state-of-the-art methods, and ablation studies further verify the effectiveness of each core component.
AB - Semantic segmentation of remote sensing images (RSIs) involves dense, pixel-wise classification of high-resolution satellite images, serving as a foundational technique for applications including land cover mapping, urban analysis, and environmental monitoring. However, accurately delineating complex and diverse objects in RSIs is challenging due to the large domain gap from natural images and the heterogeneous characteristics of ground objects. In this article, we present a novel segment anything model driven cross-hierarchical fusion network (CHFNet) for remote sensing semantic segmentation. Specifically, a segment anything model-based encoder is utilized to extract comprehensive semantic features, which are further aligned with convolutional features through a cross-modal feature alignment module. To enhance semantic consistency and multiscale representation, a feature interaction fusion module is introduced for deep interaction and fusion of hierarchical features. Furthermore, a hypergraph-based decoder is designed to capture complex topological structures and inherent global relationships in RSIs. Extensive experiments on benchmark datasets (ISPRS Vaihingen and ISPRS Potsdam) demonstrate that CHFNet consistently outperforms state-of-the-art methods, and ablation studies further verify the effectiveness of each core component.
KW - Cross-modal feature alignment
KW - multiscale feature fusion
KW - remote sensing images (RSIs)
KW - segment anything model (SAM)
KW - semantic segmentation
UR - https://www.scopus.com/pages/publications/105021029097
U2 - 10.1109/JSTARS.2025.3629124
DO - 10.1109/JSTARS.2025.3629124
M3 - 文章
AN - SCOPUS:105021029097
SN - 1939-1404
VL - 18
SP - 29511
EP - 29530
JO - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
JF - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
ER -