RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision

Jinzhong Wang; Xuetao Tian; Shun Dai; Tao Zhuo; Haorui Zeng; Hongjuan Liu; Jiaqi Liu; Xiuwei Zhang; Yanning Zhang

doi:10.1007/978-3-031-78447-7_19

RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision

Jinzhong Wang, Xuetao Tian, Shun Dai, Tao Zhuo, Haorui Zeng, Hongjuan Liu, Jiaqi Liu, Xiuwei Zhang, Yanning Zhang

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Multispectral object detection, utilizing both visible (RGB) and thermal infrared (T) modals, has garnered significant attention for its robust performance across diverse weather and lighting conditions. However, effectively exploiting the complementarity between RGB-T modals while maintaining efficiency remains a critical challenge. In this paper, a very simple Group Shuffled Multi-receptive Attention (GSMA) module is proposed to extract and combine multi-scale RGB and thermal features. Then, the extracted multi-modal features are directly integrated with a multi-level path aggregation neck, which significantly improves the fusion effect and efficiency. Meanwhile, multi-modal object detection often adopts union annotations for both modals. This kind of supervision is not sufficient and unfair, since objects observed in one modal may not be seen in the other modal. To solve this issue, Multi-modal Supervision (MS) is proposed to sufficiently supervise RGB-T object detection. Comprehensive experiments on two challenging benchmarks, KAIST and DroneVehicle, demonstrate the proposed model achieves the state-of-the-art accuracy while maintaining competitive efficiency.

源语言	英语
主期刊名	Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings
编辑	Apostolos Antonacopoulos, Subhasis Chaudhuri, Rama Chellappa, Cheng-Lin Liu, Saumik Bhattacharya, Umapada Pal
出版商	Springer Science and Business Media Deutschland GmbH
页	284-298
页数	15
ISBN（印刷版）	9783031784460
DOI	https://doi.org/10.1007/978-3-031-78447-7_19
出版状态	已出版 - 2025
活动	27th International Conference on Pattern Recognition, ICPR 2024 - Kolkata, 印度期限: 1 12月 2024 → 5 12月 2024

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	15317 LNCS
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	27th International Conference on Pattern Recognition, ICPR 2024
国家/地区	印度
市	Kolkata
时期	1/12/24 → 5/12/24

访问文件

10.1007/978-3-031-78447-7_19

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, J., Tian, X., Dai, S., Zhuo, T., Zeng, H., Liu, H., Liu, J., Zhang, X., & Zhang, Y. (2025). RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision. 在 A. Antonacopoulos, S. Chaudhuri, R. Chellappa, C.-L. Liu, S. Bhattacharya, & U. Pal (编辑), Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings (页码 284-298). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 15317 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-78447-7_19

Wang, Jinzhong ; Tian, Xuetao ; Dai, Shun 等. / RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision. Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings. 编辑 / Apostolos Antonacopoulos ; Subhasis Chaudhuri ; Rama Chellappa ; Cheng-Lin Liu ; Saumik Bhattacharya ; Umapada Pal. Springer Science and Business Media Deutschland GmbH, 2025. 页码 284-298 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{1cf51d194fb44ccebdd9a13a5e9c3632,

title = "RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision",

abstract = "Multispectral object detection, utilizing both visible (RGB) and thermal infrared (T) modals, has garnered significant attention for its robust performance across diverse weather and lighting conditions. However, effectively exploiting the complementarity between RGB-T modals while maintaining efficiency remains a critical challenge. In this paper, a very simple Group Shuffled Multi-receptive Attention (GSMA) module is proposed to extract and combine multi-scale RGB and thermal features. Then, the extracted multi-modal features are directly integrated with a multi-level path aggregation neck, which significantly improves the fusion effect and efficiency. Meanwhile, multi-modal object detection often adopts union annotations for both modals. This kind of supervision is not sufficient and unfair, since objects observed in one modal may not be seen in the other modal. To solve this issue, Multi-modal Supervision (MS) is proposed to sufficiently supervise RGB-T object detection. Comprehensive experiments on two challenging benchmarks, KAIST and DroneVehicle, demonstrate the proposed model achieves the state-of-the-art accuracy while maintaining competitive efficiency.",

keywords = "Attention mechanism, Group shuffle, Multi-modal supervision, Multispectral object detection",

author = "Jinzhong Wang and Xuetao Tian and Shun Dai and Tao Zhuo and Haorui Zeng and Hongjuan Liu and Jiaqi Liu and Xiuwei Zhang and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.; 27th International Conference on Pattern Recognition, ICPR 2024 ; Conference date: 01-12-2024 Through 05-12-2024",

year = "2025",

doi = "10.1007/978-3-031-78447-7_19",

language = "英语",

isbn = "9783031784460",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "284--298",

editor = "Apostolos Antonacopoulos and Subhasis Chaudhuri and Rama Chellappa and Cheng-Lin Liu and Saumik Bhattacharya and Umapada Pal",

booktitle = "Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings",

}

Wang, J, Tian, X, Dai, S, Zhuo, T, Zeng, H, Liu, H, Liu, J, Zhang, X & Zhang, Y 2025, RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision. 在 A Antonacopoulos, S Chaudhuri, R Chellappa, C-L Liu, S Bhattacharya & U Pal (编辑), Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 15317 LNCS, Springer Science and Business Media Deutschland GmbH, 页码 284-298, 27th International Conference on Pattern Recognition, ICPR 2024, Kolkata, 印度, 1/12/24. https://doi.org/10.1007/978-3-031-78447-7_19

RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision. / Wang, Jinzhong; Tian, Xuetao; Dai, Shun 等.
Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings. 编辑 / Apostolos Antonacopoulos; Subhasis Chaudhuri; Rama Chellappa; Cheng-Lin Liu; Saumik Bhattacharya; Umapada Pal. Springer Science and Business Media Deutschland GmbH, 2025. 页码 284-298 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 15317 LNCS).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision

AU - Wang, Jinzhong

AU - Tian, Xuetao

AU - Dai, Shun

AU - Zhuo, Tao

AU - Zeng, Haorui

AU - Liu, Hongjuan

AU - Liu, Jiaqi

AU - Zhang, Xiuwei

AU - Zhang, Yanning

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

PY - 2025

Y1 - 2025

N2 - Multispectral object detection, utilizing both visible (RGB) and thermal infrared (T) modals, has garnered significant attention for its robust performance across diverse weather and lighting conditions. However, effectively exploiting the complementarity between RGB-T modals while maintaining efficiency remains a critical challenge. In this paper, a very simple Group Shuffled Multi-receptive Attention (GSMA) module is proposed to extract and combine multi-scale RGB and thermal features. Then, the extracted multi-modal features are directly integrated with a multi-level path aggregation neck, which significantly improves the fusion effect and efficiency. Meanwhile, multi-modal object detection often adopts union annotations for both modals. This kind of supervision is not sufficient and unfair, since objects observed in one modal may not be seen in the other modal. To solve this issue, Multi-modal Supervision (MS) is proposed to sufficiently supervise RGB-T object detection. Comprehensive experiments on two challenging benchmarks, KAIST and DroneVehicle, demonstrate the proposed model achieves the state-of-the-art accuracy while maintaining competitive efficiency.

AB - Multispectral object detection, utilizing both visible (RGB) and thermal infrared (T) modals, has garnered significant attention for its robust performance across diverse weather and lighting conditions. However, effectively exploiting the complementarity between RGB-T modals while maintaining efficiency remains a critical challenge. In this paper, a very simple Group Shuffled Multi-receptive Attention (GSMA) module is proposed to extract and combine multi-scale RGB and thermal features. Then, the extracted multi-modal features are directly integrated with a multi-level path aggregation neck, which significantly improves the fusion effect and efficiency. Meanwhile, multi-modal object detection often adopts union annotations for both modals. This kind of supervision is not sufficient and unfair, since objects observed in one modal may not be seen in the other modal. To solve this issue, Multi-modal Supervision (MS) is proposed to sufficiently supervise RGB-T object detection. Comprehensive experiments on two challenging benchmarks, KAIST and DroneVehicle, demonstrate the proposed model achieves the state-of-the-art accuracy while maintaining competitive efficiency.

KW - Attention mechanism

KW - Group shuffle

KW - Multi-modal supervision

KW - Multispectral object detection

UR - http://www.scopus.com/inward/record.url?scp=85211915729&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-78447-7_19

DO - 10.1007/978-3-031-78447-7_19

M3 - 会议稿件

AN - SCOPUS:85211915729

SN - 9783031784460

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 284

EP - 298

BT - Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings

A2 - Antonacopoulos, Apostolos

A2 - Chaudhuri, Subhasis

A2 - Chellappa, Rama

A2 - Liu, Cheng-Lin

A2 - Bhattacharya, Saumik

A2 - Pal, Umapada

PB - Springer Science and Business Media Deutschland GmbH

T2 - 27th International Conference on Pattern Recognition, ICPR 2024

Y2 - 1 December 2024 through 5 December 2024

ER -

Wang J, Tian X, Dai S, Zhuo T, Zeng H, Liu H 等. RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision. 在 Antonacopoulos A, Chaudhuri S, Chellappa R, Liu CL, Bhattacharya S, Pal U, 编辑, Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings. Springer Science and Business Media Deutschland GmbH. 2025. 页码 284-298. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-78447-7_19

RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此