TY - JOUR
T1 - PLVAM-DETR
T2 - Patch-Level Visibility Aware Multi-spectral Detection Transformer with Frequency Specific Fusion
AU - Zhang, Xiuwei
AU - Zeng, Haorui
AU - Zhang, Xiaoqiang
AU - Wu, Wencong
AU - Yin, Hanlin
AU - Dai, Shun
AU - Zhang, Yanning
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - Multi-spectral object detection demonstrates enhanced robustness compared to single-spectral object detection by leveraging complementary information from both visible and infrared spectra. Current presented methods often fuse multi-spectral information using attention mechanisms, such as spatial/channel attention for CNN-based approaches and cross-attention/self-attention for Transformer-based approaches. While, object visibility across different regions in different spectral images is varying. Therefore, explicit patch-level visibility-awareness is required to perform finer spatial and spectral exploration. Moreover, the distinct frequency characteristics of infrared and visible images are rarely highlighted, hindering effective utilization of their complementary benefits. To overcome these limitations, we propose a patch-level visibility aware multi-spectral detection Transformer with frequency specific fusion, named PLVAM-DETR. A patch-level visibility aware module is designed to dynamically determine the significance across different patches in different spectral images. Then, a frequency specific feature fusion module is presented to highlight high-frequency information in the infrared features and low-frequency information in the visible features, which provides more comprehensive feature for detection. Extensive experiments on publicly available datasets demonstrate competitive results compared to state-of-the-art methods.
AB - Multi-spectral object detection demonstrates enhanced robustness compared to single-spectral object detection by leveraging complementary information from both visible and infrared spectra. Current presented methods often fuse multi-spectral information using attention mechanisms, such as spatial/channel attention for CNN-based approaches and cross-attention/self-attention for Transformer-based approaches. While, object visibility across different regions in different spectral images is varying. Therefore, explicit patch-level visibility-awareness is required to perform finer spatial and spectral exploration. Moreover, the distinct frequency characteristics of infrared and visible images are rarely highlighted, hindering effective utilization of their complementary benefits. To overcome these limitations, we propose a patch-level visibility aware multi-spectral detection Transformer with frequency specific fusion, named PLVAM-DETR. A patch-level visibility aware module is designed to dynamically determine the significance across different patches in different spectral images. Then, a frequency specific feature fusion module is presented to highlight high-frequency information in the infrared features and low-frequency information in the visible features, which provides more comprehensive feature for detection. Extensive experiments on publicly available datasets demonstrate competitive results compared to state-of-the-art methods.
KW - DETR
KW - Multi-modal Fusion
KW - Multi-spectral Object Detection
KW - Patch-level Visibility Awareness
UR - https://www.scopus.com/pages/publications/105035150183
U2 - 10.1109/TMM.2026.3678441
DO - 10.1109/TMM.2026.3678441
M3 - 文章
AN - SCOPUS:105035150183
SN - 1520-9210
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -