MS-DETR: Multispectral Pedestrian Detection Transformer With Loosely Coupled Fusion and Modality-Balanced Optimization

Yinghui Xing; Shuo Yang; Song Wang; Shizhou Zhang; Guoqiang Liang; Xiuwei Zhang; Yanning Zhang

doi:10.1109/TITS.2024.3450584

MS-DETR: Multispectral Pedestrian Detection Transformer With Loosely Coupled Fusion and Modality-Balanced Optimization

Yinghui Xing, Shuo Yang, Song Wang, Shizhou Zhang, Guoqiang Liang, Xiuwei Zhang, Yanning Zhang

计算机学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

4 引用（Scopus）

摘要

Multispectral pedestrian detection is an important task for many around-the-clock applications, since the visible and thermal modalities can provide complementary information especially under low light conditions. Due to the presence of two modalities, misalignment and modality imbalance are the most significant issues in multispectral pedestrian detection. In this paper, we propose MultiSpectral pedestrian DEtection TRansformer (MS-DETR) to fix above issues. MS-DETR consists of two modality-specific backbones and Transformer encoders, followed by a multi-modal Transformer decoder, and the visible and thermal features are fused in the multi-modal Transformer decoder. To well resist the misalignment between multi-modal images, we design a loosely coupled fusion strategy by sparsely sampling some keypoints from multi-modal features independently and fusing them with adaptively learned attention weights. Moreover, based on the insight that not only different modalities, but also different pedestrian instances tend to have different confidence scores to final detection, we further propose an instance-aware modality-balanced optimization strategy, which preserves visible and thermal decoder branches and aligns their predicted slots through an instance-wise dynamic loss. Our end-to-end MS-DETR shows superior performance on the challenging KAIST, CVC-14 and LLVIP benchmark datasets. The source code is available at https://github.com/YinghuiXing/MS-DETR.

源语言	英语
页（从-至）	20628-20642
页数	15
期刊	IEEE Transactions on Intelligent Transportation Systems
卷	25
期	12
DOI	https://doi.org/10.1109/TITS.2024.3450584
出版状态	已出版 - 2024

访问文件

10.1109/TITS.2024.3450584

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{96d353e14de5431082665b2b72ebdf4a,

title = "MS-DETR: Multispectral Pedestrian Detection Transformer With Loosely Coupled Fusion and Modality-Balanced Optimization",

abstract = "Multispectral pedestrian detection is an important task for many around-the-clock applications, since the visible and thermal modalities can provide complementary information especially under low light conditions. Due to the presence of two modalities, misalignment and modality imbalance are the most significant issues in multispectral pedestrian detection. In this paper, we propose MultiSpectral pedestrian DEtection TRansformer (MS-DETR) to fix above issues. MS-DETR consists of two modality-specific backbones and Transformer encoders, followed by a multi-modal Transformer decoder, and the visible and thermal features are fused in the multi-modal Transformer decoder. To well resist the misalignment between multi-modal images, we design a loosely coupled fusion strategy by sparsely sampling some keypoints from multi-modal features independently and fusing them with adaptively learned attention weights. Moreover, based on the insight that not only different modalities, but also different pedestrian instances tend to have different confidence scores to final detection, we further propose an instance-aware modality-balanced optimization strategy, which preserves visible and thermal decoder branches and aligns their predicted slots through an instance-wise dynamic loss. Our end-to-end MS-DETR shows superior performance on the challenging KAIST, CVC-14 and LLVIP benchmark datasets. The source code is available at https://github.com/YinghuiXing/MS-DETR.",

keywords = "end-to-end detector, loosely coupled fusion, modality-balanced optimization, Multispectral pedestrian detection",

author = "Yinghui Xing and Shuo Yang and Song Wang and Shizhou Zhang and Guoqiang Liang and Xiuwei Zhang and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 2000-2011 IEEE.",

year = "2024",

doi = "10.1109/TITS.2024.3450584",

language = "英语",

volume = "25",

pages = "20628--20642",

journal = "IEEE Transactions on Intelligent Transportation Systems",

issn = "1524-9050",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "12",

}

TY - JOUR

T1 - MS-DETR

T2 - Multispectral Pedestrian Detection Transformer With Loosely Coupled Fusion and Modality-Balanced Optimization

AU - Xing, Yinghui

AU - Yang, Shuo

AU - Wang, Song

AU - Zhang, Shizhou

AU - Liang, Guoqiang

AU - Zhang, Xiuwei

AU - Zhang, Yanning

PY - 2024

Y1 - 2024

N2 - Multispectral pedestrian detection is an important task for many around-the-clock applications, since the visible and thermal modalities can provide complementary information especially under low light conditions. Due to the presence of two modalities, misalignment and modality imbalance are the most significant issues in multispectral pedestrian detection. In this paper, we propose MultiSpectral pedestrian DEtection TRansformer (MS-DETR) to fix above issues. MS-DETR consists of two modality-specific backbones and Transformer encoders, followed by a multi-modal Transformer decoder, and the visible and thermal features are fused in the multi-modal Transformer decoder. To well resist the misalignment between multi-modal images, we design a loosely coupled fusion strategy by sparsely sampling some keypoints from multi-modal features independently and fusing them with adaptively learned attention weights. Moreover, based on the insight that not only different modalities, but also different pedestrian instances tend to have different confidence scores to final detection, we further propose an instance-aware modality-balanced optimization strategy, which preserves visible and thermal decoder branches and aligns their predicted slots through an instance-wise dynamic loss. Our end-to-end MS-DETR shows superior performance on the challenging KAIST, CVC-14 and LLVIP benchmark datasets. The source code is available at https://github.com/YinghuiXing/MS-DETR.

AB - Multispectral pedestrian detection is an important task for many around-the-clock applications, since the visible and thermal modalities can provide complementary information especially under low light conditions. Due to the presence of two modalities, misalignment and modality imbalance are the most significant issues in multispectral pedestrian detection. In this paper, we propose MultiSpectral pedestrian DEtection TRansformer (MS-DETR) to fix above issues. MS-DETR consists of two modality-specific backbones and Transformer encoders, followed by a multi-modal Transformer decoder, and the visible and thermal features are fused in the multi-modal Transformer decoder. To well resist the misalignment between multi-modal images, we design a loosely coupled fusion strategy by sparsely sampling some keypoints from multi-modal features independently and fusing them with adaptively learned attention weights. Moreover, based on the insight that not only different modalities, but also different pedestrian instances tend to have different confidence scores to final detection, we further propose an instance-aware modality-balanced optimization strategy, which preserves visible and thermal decoder branches and aligns their predicted slots through an instance-wise dynamic loss. Our end-to-end MS-DETR shows superior performance on the challenging KAIST, CVC-14 and LLVIP benchmark datasets. The source code is available at https://github.com/YinghuiXing/MS-DETR.

KW - end-to-end detector

KW - loosely coupled fusion

KW - modality-balanced optimization

KW - Multispectral pedestrian detection

UR - http://www.scopus.com/inward/record.url?scp=85210918883&partnerID=8YFLogxK

U2 - 10.1109/TITS.2024.3450584

DO - 10.1109/TITS.2024.3450584

M3 - 文章

AN - SCOPUS:85210918883

SN - 1524-9050

VL - 25

SP - 20628

EP - 20642

JO - IEEE Transactions on Intelligent Transportation Systems

JF - IEEE Transactions on Intelligent Transportation Systems

IS - 12

ER -

MS-DETR: Multispectral Pedestrian Detection Transformer With Loosely Coupled Fusion and Modality-Balanced Optimization

摘要

访问文件

其它文件与链接

指纹

引用此