Multi-modal Spatio-temporal Transformer for Defect Recognition of Substation Equipment

Yiyang Yao; Zexing Du; Xue Wang; Qing Wang

doi:10.1007/978-981-96-2914-5_20

Multi-modal Spatio-temporal Transformer for Defect Recognition of Substation Equipment

Yiyang Yao, Zexing Du, Xue Wang, Qing Wang

计算机学院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

The utilization of multi-spectral imaging, such as infrared, visible light, and ultraviolet, for recognizing defects in electrical equipment mostly focuses on static measurements and lacks exploration into the dynamic process of defect development. To better exploit dynamic measurements, this paper proposes a novel defect recognition method using tri-spectral videos. Specifically, a multi-modal spatio-temporal Transformer is presented to effectively decompose spatio-temporal features present in various modalities. Besides, a spatio-temporal multi-modal contrastive loss is introduced for self-supervised learning. By aligning extracted features both spatially and temporally across modalities, this loss helps mitigate confusion between modalities and improve the discriminative capacity of learned representations. To evaluate the proposed method, we self-collect a tri-spectral dataset, TROPED, which covers a wide range of dynamic defects in operational substation equipment, and benchmark results on the dataset. Experimental results demonstrate the effectiveness and robustness of the proposed method against other state-of-the-art methods.

源语言	英语
主期刊名	Artificial Intelligence and Robotics - 9th International Symposium, ISAIR 2024, Revised Selected Papers
编辑	Huimin Lu
出版商	Springer Science and Business Media Deutschland GmbH
页	211-222
页数	12
ISBN（印刷版）	9789819629138
DOI	https://doi.org/10.1007/978-981-96-2914-5_20
出版状态	已出版 - 2025
活动	9th International Symposium on Artificial Intelligence and Robotics, ISAIR 2024 - Guilin, 中国期限: 27 9月 2024 → 30 9月 2024

出版系列

姓名	Communications in Computer and Information Science
卷	2403 CCIS
ISSN（印刷版）	1865-0929
ISSN（电子版）	1865-0937

会议

会议	9th International Symposium on Artificial Intelligence and Robotics, ISAIR 2024
国家/地区	中国
市	Guilin
时期	27/09/24 → 30/09/24

访问文件

10.1007/978-981-96-2914-5_20

其它文件与链接

链接到 Scopus 的出版物

引用此

Yao, Y., Du, Z., Wang, X., & Wang, Q. (2025). Multi-modal Spatio-temporal Transformer for Defect Recognition of Substation Equipment. 在 H. Lu (编辑), Artificial Intelligence and Robotics - 9th International Symposium, ISAIR 2024, Revised Selected Papers (页码 211-222). (Communications in Computer and Information Science; 卷 2403 CCIS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-96-2914-5_20

Yao, Yiyang ; Du, Zexing ; Wang, Xue 等. / Multi-modal Spatio-temporal Transformer for Defect Recognition of Substation Equipment. Artificial Intelligence and Robotics - 9th International Symposium, ISAIR 2024, Revised Selected Papers. 编辑 / Huimin Lu. Springer Science and Business Media Deutschland GmbH, 2025. 页码 211-222 (Communications in Computer and Information Science).

@inproceedings{302882071ee445f4961b84e331581054,

title = "Multi-modal Spatio-temporal Transformer for Defect Recognition of Substation Equipment",

abstract = "The utilization of multi-spectral imaging, such as infrared, visible light, and ultraviolet, for recognizing defects in electrical equipment mostly focuses on static measurements and lacks exploration into the dynamic process of defect development. To better exploit dynamic measurements, this paper proposes a novel defect recognition method using tri-spectral videos. Specifically, a multi-modal spatio-temporal Transformer is presented to effectively decompose spatio-temporal features present in various modalities. Besides, a spatio-temporal multi-modal contrastive loss is introduced for self-supervised learning. By aligning extracted features both spatially and temporally across modalities, this loss helps mitigate confusion between modalities and improve the discriminative capacity of learned representations. To evaluate the proposed method, we self-collect a tri-spectral dataset, TROPED, which covers a wide range of dynamic defects in operational substation equipment, and benchmark results on the dataset. Experimental results demonstrate the effectiveness and robustness of the proposed method against other state-of-the-art methods.",

keywords = "Defect Recognition, Dynamic Measurements, Electrical Equipment, Multi-modal Spatio-temporal Learning, Self-supervised Learning",

author = "Yiyang Yao and Zexing Du and Xue Wang and Qing Wang",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.; 9th International Symposium on Artificial Intelligence and Robotics, ISAIR 2024 ; Conference date: 27-09-2024 Through 30-09-2024",

year = "2025",

doi = "10.1007/978-981-96-2914-5_20",

language = "英语",

isbn = "9789819629138",

series = "Communications in Computer and Information Science",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "211--222",

editor = "Huimin Lu",

booktitle = "Artificial Intelligence and Robotics - 9th International Symposium, ISAIR 2024, Revised Selected Papers",

}

Yao, Y, Du, Z, Wang, X & Wang, Q 2025, Multi-modal Spatio-temporal Transformer for Defect Recognition of Substation Equipment. 在 H Lu (编辑), Artificial Intelligence and Robotics - 9th International Symposium, ISAIR 2024, Revised Selected Papers. Communications in Computer and Information Science, 卷 2403 CCIS, Springer Science and Business Media Deutschland GmbH, 页码 211-222, 9th International Symposium on Artificial Intelligence and Robotics, ISAIR 2024, Guilin, 中国, 27/09/24. https://doi.org/10.1007/978-981-96-2914-5_20

Multi-modal Spatio-temporal Transformer for Defect Recognition of Substation Equipment. / Yao, Yiyang; Du, Zexing; Wang, Xue 等.
Artificial Intelligence and Robotics - 9th International Symposium, ISAIR 2024, Revised Selected Papers. 编辑 / Huimin Lu. Springer Science and Business Media Deutschland GmbH, 2025. 页码 211-222 (Communications in Computer and Information Science; 卷 2403 CCIS).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Multi-modal Spatio-temporal Transformer for Defect Recognition of Substation Equipment

AU - Yao, Yiyang

AU - Du, Zexing

AU - Wang, Xue

AU - Wang, Qing

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

PY - 2025

Y1 - 2025

N2 - The utilization of multi-spectral imaging, such as infrared, visible light, and ultraviolet, for recognizing defects in electrical equipment mostly focuses on static measurements and lacks exploration into the dynamic process of defect development. To better exploit dynamic measurements, this paper proposes a novel defect recognition method using tri-spectral videos. Specifically, a multi-modal spatio-temporal Transformer is presented to effectively decompose spatio-temporal features present in various modalities. Besides, a spatio-temporal multi-modal contrastive loss is introduced for self-supervised learning. By aligning extracted features both spatially and temporally across modalities, this loss helps mitigate confusion between modalities and improve the discriminative capacity of learned representations. To evaluate the proposed method, we self-collect a tri-spectral dataset, TROPED, which covers a wide range of dynamic defects in operational substation equipment, and benchmark results on the dataset. Experimental results demonstrate the effectiveness and robustness of the proposed method against other state-of-the-art methods.

AB - The utilization of multi-spectral imaging, such as infrared, visible light, and ultraviolet, for recognizing defects in electrical equipment mostly focuses on static measurements and lacks exploration into the dynamic process of defect development. To better exploit dynamic measurements, this paper proposes a novel defect recognition method using tri-spectral videos. Specifically, a multi-modal spatio-temporal Transformer is presented to effectively decompose spatio-temporal features present in various modalities. Besides, a spatio-temporal multi-modal contrastive loss is introduced for self-supervised learning. By aligning extracted features both spatially and temporally across modalities, this loss helps mitigate confusion between modalities and improve the discriminative capacity of learned representations. To evaluate the proposed method, we self-collect a tri-spectral dataset, TROPED, which covers a wide range of dynamic defects in operational substation equipment, and benchmark results on the dataset. Experimental results demonstrate the effectiveness and robustness of the proposed method against other state-of-the-art methods.

KW - Defect Recognition

KW - Dynamic Measurements

KW - Electrical Equipment

KW - Multi-modal Spatio-temporal Learning

KW - Self-supervised Learning

UR - http://www.scopus.com/inward/record.url?scp=105001346795&partnerID=8YFLogxK

U2 - 10.1007/978-981-96-2914-5_20

DO - 10.1007/978-981-96-2914-5_20

M3 - 会议稿件

AN - SCOPUS:105001346795

SN - 9789819629138

T3 - Communications in Computer and Information Science

SP - 211

EP - 222

BT - Artificial Intelligence and Robotics - 9th International Symposium, ISAIR 2024, Revised Selected Papers

A2 - Lu, Huimin

PB - Springer Science and Business Media Deutschland GmbH

T2 - 9th International Symposium on Artificial Intelligence and Robotics, ISAIR 2024

Y2 - 27 September 2024 through 30 September 2024

ER -

Yao Y, Du Z, Wang X, Wang Q. Multi-modal Spatio-temporal Transformer for Defect Recognition of Substation Equipment. 在 Lu H, 编辑, Artificial Intelligence and Robotics - 9th International Symposium, ISAIR 2024, Revised Selected Papers. Springer Science and Business Media Deutschland GmbH. 2025. 页码 211-222. (Communications in Computer and Information Science). doi: 10.1007/978-981-96-2914-5_20

Multi-modal Spatio-temporal Transformer for Defect Recognition of Substation Equipment

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此