TY - JOUR
T1 - Cross-Modal Spherical Aggregation for Weakly Supervised Remote Sensing Shadow Removal
AU - Chi, Kaichen
AU - Jing, Wei
AU - Li, Junjie
AU - Li, Qiang
AU - Wang, Qi
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Shadows are dark areas, typically rendering low illumination intensity. Admittedly, the infrared image can provide robust illumination cues that the visible image lacks, but existing methods ignore the collaboration between heterogeneous modalities. To fill this gap, we propose a weakly supervised shadow removal network with a spherical feature space, dubbed S2-ShadowNet, to explore the best of both worlds for visible and infrared modalities. Specifically, we employ a modal translation (visible-to-infrared) model to learn the cross-domain mapping, thus generating realistic infrared samples. Then, Swin Transformer is utilized to extract strong representational visible/infrared features. Simultaneously, the extracted features are mapped to the smooth spherical manifold, which alleviates the domain shift through regularization. Well-designed similarity loss and orthogonality loss are embedded into the spherical space, prompting the separation of private visible/infrared features and the alignment of shared visible/infrared features through constraints on both representation content and orientation. Such a manner encourages implicit reciprocity between modalities, thus providing a novel insight into shadow removal. Notably, ground truth is not available in practice, thus S2-ShadowNet is trained by cropping shadow and shadow-free patches from the shadow image itself, avoiding stereotypical and strict pair data acquisition. More importantly, we contribute a largescale weakly supervised shadow removal benchmark that makes shadow removal independent of specific scenario constraints possible. Extensive experiments demonstrate that S2-ShadowNet outperforms state-of-the-art methods in both qualitative and quantitative comparisons.
AB - Shadows are dark areas, typically rendering low illumination intensity. Admittedly, the infrared image can provide robust illumination cues that the visible image lacks, but existing methods ignore the collaboration between heterogeneous modalities. To fill this gap, we propose a weakly supervised shadow removal network with a spherical feature space, dubbed S2-ShadowNet, to explore the best of both worlds for visible and infrared modalities. Specifically, we employ a modal translation (visible-to-infrared) model to learn the cross-domain mapping, thus generating realistic infrared samples. Then, Swin Transformer is utilized to extract strong representational visible/infrared features. Simultaneously, the extracted features are mapped to the smooth spherical manifold, which alleviates the domain shift through regularization. Well-designed similarity loss and orthogonality loss are embedded into the spherical space, prompting the separation of private visible/infrared features and the alignment of shared visible/infrared features through constraints on both representation content and orientation. Such a manner encourages implicit reciprocity between modalities, thus providing a novel insight into shadow removal. Notably, ground truth is not available in practice, thus S2-ShadowNet is trained by cropping shadow and shadow-free patches from the shadow image itself, avoiding stereotypical and strict pair data acquisition. More importantly, we contribute a largescale weakly supervised shadow removal benchmark that makes shadow removal independent of specific scenario constraints possible. Extensive experiments demonstrate that S2-ShadowNet outperforms state-of-the-art methods in both qualitative and quantitative comparisons.
KW - multi-modal vision
KW - Shadow removal
KW - spherical space
KW - weakly supervised learning
UR - https://www.scopus.com/pages/publications/105018380020
U2 - 10.1109/TMM.2025.3618537
DO - 10.1109/TMM.2025.3618537
M3 - 文章
AN - SCOPUS:105018380020
SN - 1520-9210
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -