TY - JOUR
T1 - Integrating Pseudo-Supervision and Spatial Constraints for Efficient Clustering of Multimodal Remote Sensing Data
AU - Cao, Zhe
AU - Zhang, Tao
AU - Zhao, Zihua
AU - Xin, Haonan
AU - Wang, Rong
AU - Nie, Feiping
N1 - Publisher Copyright:
© 2026 IEEE.
PY - 2026
Y1 - 2026
N2 - Multimodal remote sensing (RS) clustering synergistically integrates multi-dimensional information, effectively addressing the representational limitations of single-modality data. This integration offers essential technical support for fine-grained recognition and accurate interpretation of ground objects in complex scenarios. However, existing methods still face several challenges, including insufficient utilization of spatial information, limited ability to extract consistent information due to inter-modal heterogeneity, and low efficiency when handling large-scale and complex datasets. To address these issues, we propose an Integrating Pseudo-Supervision and Spatial Constraints for Efficient Clustering (PSSC) of Multimodal Remote Sensing Data model. The proposed method begins by constructing spatial bipartite graphs from multimodal data to fully exploit spatial information while reducing computational complexity. These graphs are then stacked into a third-order tensor, upon which a robust denoised representation is learned to suppress noise and preserve the core structural characteristics of the multimodal inputs. Based on this clean tensor, PSSC captures cross-modal consistency by minimizing the tensor nuclear norm within the low-rank space. To further enhance clustering efficiency and accuracy, a region homogeneity-constrained rapid label generation strategy is proposed, which leverages high-confidence pseudo-supervision information from homogeneous regions to iteratively refine clustering labels, thereby significantly reducing computational overhead. Extensive experiments on real-world multimodal datasets validate the effectiveness and superior performance of the proposed method.
AB - Multimodal remote sensing (RS) clustering synergistically integrates multi-dimensional information, effectively addressing the representational limitations of single-modality data. This integration offers essential technical support for fine-grained recognition and accurate interpretation of ground objects in complex scenarios. However, existing methods still face several challenges, including insufficient utilization of spatial information, limited ability to extract consistent information due to inter-modal heterogeneity, and low efficiency when handling large-scale and complex datasets. To address these issues, we propose an Integrating Pseudo-Supervision and Spatial Constraints for Efficient Clustering (PSSC) of Multimodal Remote Sensing Data model. The proposed method begins by constructing spatial bipartite graphs from multimodal data to fully exploit spatial information while reducing computational complexity. These graphs are then stacked into a third-order tensor, upon which a robust denoised representation is learned to suppress noise and preserve the core structural characteristics of the multimodal inputs. Based on this clean tensor, PSSC captures cross-modal consistency by minimizing the tensor nuclear norm within the low-rank space. To further enhance clustering efficiency and accuracy, a region homogeneity-constrained rapid label generation strategy is proposed, which leverages high-confidence pseudo-supervision information from homogeneous regions to iteratively refine clustering labels, thereby significantly reducing computational overhead. Extensive experiments on real-world multimodal datasets validate the effectiveness and superior performance of the proposed method.
KW - Bipartite graph
KW - Multimodal remote sensing
KW - Pseudo-supervised clustering
KW - Tensorized graph learning
KW - Unsupervised clustering
UR - https://www.scopus.com/pages/publications/105038772828
U2 - 10.1109/TCSVT.2026.3690673
DO - 10.1109/TCSVT.2026.3690673
M3 - 文章
AN - SCOPUS:105038772828
SN - 1051-8215
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
ER -