Abstract
Highlights: What are the main findings? The integration of wavelet transform in multimodal feature fusion significantly enhances the model’s ability to preserve edge information. The fusion strategy enables early interaction of complementary information and effectively improves feature discriminability through feature enhancement. What is the implication of the main finding? The study provides a solution to improve edge clarity in remote sensing image segmentation. The study provides a key solution to enhance segmentation capability under multimodal fusion. Remote sensing image segmentation is essential for resource planning and disaster monitoring. Although RGB-based methods are widely adopted, they often exhibit suboptimal performance in distinguishing objects with similar color and texture characteristics. The fusion of height information from Digital Surface Models (DSM) aids in the discrimination of these challenging objects. However, existing CNN- and pooling-based fusion methods tend to lose edge details as network depth increases, resulting in blurred segmentation boundaries. To address this issue, a Multimodal Spatial–Frequency Fusion Network (MSFFNet) is proposed to effectively enhance edge details by fusing high-level frequency and spatial features. Specifically, a Hybrid Branch Fusion Module (HBFM) is proposed, in which the wavelet transform branch decomposes features into sub-components, effectively isolating edge and structural information from other textures. Such a process in the frequency domain prevents edge details from being lost or diluted during fusion, thereby preserving boundary clarity in segmentation. Additionally, a Multi-Scale Contextual Attention Module (MSCAM) is proposed to capture multi-scale contextual information for enhancing spatial feature representation, while adjusting both spatial and channel-wise attention mechanisms to improve detail and accuracy. Experiments over benchmark Vaihingen and Potsdam datasets demonstrate that the proposed approach can clearly enhance edge delineation while improving segmentation accuracy.
| Original language | English |
|---|---|
| Article number | 3745 |
| Journal | Remote Sensing |
| Volume | 17 |
| Issue number | 22 |
| DOIs | |
| State | Published - Nov 2025 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- attention mechanism
- multimodal fusion
- remote sensing image segmentation
- wavelet transform
Fingerprint
Dive into the research topics of 'MSFFNet: Multimodal Spatial–Frequency Fusion Network for RGB-DSM Remote Sensing Image Segmentation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver