TY - GEN
T1 - Multi-modal Spatio-temporal Transformer for Defect Recognition of Substation Equipment
AU - Yao, Yiyang
AU - Du, Zexing
AU - Wang, Xue
AU - Wang, Qing
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - The utilization of multi-spectral imaging, such as infrared, visible light, and ultraviolet, for recognizing defects in electrical equipment mostly focuses on static measurements and lacks exploration into the dynamic process of defect development. To better exploit dynamic measurements, this paper proposes a novel defect recognition method using tri-spectral videos. Specifically, a multi-modal spatio-temporal Transformer is presented to effectively decompose spatio-temporal features present in various modalities. Besides, a spatio-temporal multi-modal contrastive loss is introduced for self-supervised learning. By aligning extracted features both spatially and temporally across modalities, this loss helps mitigate confusion between modalities and improve the discriminative capacity of learned representations. To evaluate the proposed method, we self-collect a tri-spectral dataset, TROPED, which covers a wide range of dynamic defects in operational substation equipment, and benchmark results on the dataset. Experimental results demonstrate the effectiveness and robustness of the proposed method against other state-of-the-art methods.
AB - The utilization of multi-spectral imaging, such as infrared, visible light, and ultraviolet, for recognizing defects in electrical equipment mostly focuses on static measurements and lacks exploration into the dynamic process of defect development. To better exploit dynamic measurements, this paper proposes a novel defect recognition method using tri-spectral videos. Specifically, a multi-modal spatio-temporal Transformer is presented to effectively decompose spatio-temporal features present in various modalities. Besides, a spatio-temporal multi-modal contrastive loss is introduced for self-supervised learning. By aligning extracted features both spatially and temporally across modalities, this loss helps mitigate confusion between modalities and improve the discriminative capacity of learned representations. To evaluate the proposed method, we self-collect a tri-spectral dataset, TROPED, which covers a wide range of dynamic defects in operational substation equipment, and benchmark results on the dataset. Experimental results demonstrate the effectiveness and robustness of the proposed method against other state-of-the-art methods.
KW - Defect Recognition
KW - Dynamic Measurements
KW - Electrical Equipment
KW - Multi-modal Spatio-temporal Learning
KW - Self-supervised Learning
UR - http://www.scopus.com/inward/record.url?scp=105001346795&partnerID=8YFLogxK
U2 - 10.1007/978-981-96-2914-5_20
DO - 10.1007/978-981-96-2914-5_20
M3 - 会议稿件
AN - SCOPUS:105001346795
SN - 9789819629138
T3 - Communications in Computer and Information Science
SP - 211
EP - 222
BT - Artificial Intelligence and Robotics - 9th International Symposium, ISAIR 2024, Revised Selected Papers
A2 - Lu, Huimin
PB - Springer Science and Business Media Deutschland GmbH
T2 - 9th International Symposium on Artificial Intelligence and Robotics, ISAIR 2024
Y2 - 27 September 2024 through 30 September 2024
ER -