Abstract
The effective utilization of multimodal heterogeneous data can significantly improve the accuracy of remote sensing land-cover classification. However, simply introducing extra network structures for fusion causes modality imbalance, where the dominant modality interferes with the learning rate and update direction of others, limiting effective multimodal utilization and classification performance. To better leverage multimodal features, we propose a prototypical rebalancing network with semantic alignment (PRSANet) for multimodal remote sensing image classification. Specifically, to effectively fuse complementary multimodal information and constrain the optimization direction of each modality, a semantic alignment-based graph fusion module is proposed, which enhances the correlation between the fused features and land cover categories. This module promotes the convergence of multimodal branches toward consistent semantic representations. Meanwhile, a prototypical rebalancing module is proposed, which constructs a nonparametric classifier based on category prototypes to calculate an imbalance factor, designed for a quantitative evaluation of the optimization degree of each modality. Then, based on this imbalance factor, an intermodal independent prototype loss is designed to enhance the performance of slow-learning modalities and guide their update direction. Experimental results on three heterogeneous datasets demonstrate that the proposed method achieves significant performance in multimodal land cover classification tasks.
| Original language | English |
|---|---|
| Pages (from-to) | 27582-27596 |
| Number of pages | 15 |
| Journal | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| Volume | 18 |
| DOIs | |
| State | Published - 2025 |
Keywords
- Imbalance factor
- modality rebalancing
- multimodal fusion classification
- semantic alignment
Fingerprint
Dive into the research topics of 'Prototypical Rebalancing Network With Semantic Alignment for Multimodal Remote Sensing Image Classification'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver