TY - GEN
T1 - eSwin-UNet
T2 - 28th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2022
AU - Cui, Helei
AU - Xing, Tao
AU - Ren, Jiaju
AU - Chen, Yaxing
AU - Yu, Zhiwen
AU - Guo, Bin
AU - Guo, Xiaobing
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Surface inspection of industrial equipment defection plays a vital role in real production. Traditional inspection routines require a large number of inspection workers, which not only affects production efficiency but also leads to unreliable results. Computer vision-based detection approaches, e.g., using the deep learning method, have shown great potential in this trend. Specifically, the semantic segmentation algorithm based on Convolutional Neural Network (CNN) can extract relatively complete feature information. And the Transformer, which emerged from the field of Natural Language Processing (NLP), also performs well in maintaining and transmitting semantic information. In light of these, we propose to design a segmentation model called eSwin-UNet, i.e., enhanced Swin-UNet, that leverages the advantages of the CNN and Transformer. It uses multi-scale information fusion to better integrate the feature information in the CNN and Transformer branches. Moreover, it also utilizes deep supervision and makes two branches for collaborative training to further improve accuracy. By testing with the MVTec ITODD dataset, Fl-Score and Jaccard achieve results of 0.7891 and 0.6516 respectively, which outperform most current models.
AB - Surface inspection of industrial equipment defection plays a vital role in real production. Traditional inspection routines require a large number of inspection workers, which not only affects production efficiency but also leads to unreliable results. Computer vision-based detection approaches, e.g., using the deep learning method, have shown great potential in this trend. Specifically, the semantic segmentation algorithm based on Convolutional Neural Network (CNN) can extract relatively complete feature information. And the Transformer, which emerged from the field of Natural Language Processing (NLP), also performs well in maintaining and transmitting semantic information. In light of these, we propose to design a segmentation model called eSwin-UNet, i.e., enhanced Swin-UNet, that leverages the advantages of the CNN and Transformer. It uses multi-scale information fusion to better integrate the feature information in the CNN and Transformer branches. Moreover, it also utilizes deep supervision and makes two branches for collaborative training to further improve accuracy. By testing with the MVTec ITODD dataset, Fl-Score and Jaccard achieve results of 0.7891 and 0.6516 respectively, which outperform most current models.
KW - Deep Learning
KW - Defect Detection
KW - Industrial Equipment
KW - Transformer
KW - U-Net
UR - http://www.scopus.com/inward/record.url?scp=85152914431&partnerID=8YFLogxK
U2 - 10.1109/ICPADS56603.2022.00056
DO - 10.1109/ICPADS56603.2022.00056
M3 - 会议稿件
AN - SCOPUS:85152914431
T3 - Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS
SP - 379
EP - 386
BT - Proceedings - 2022 IEEE 28th International Conference on Parallel and Distributed Systems, ICPADS 2022
PB - IEEE Computer Society
Y2 - 10 January 2023 through 12 January 2023
ER -