TY - GEN
T1 - Building Intrinsically Interpretable Deep Neural Networks
T2 - 2024 China Automation Congress, CAC 2024
AU - Zhang, Wenyan
AU - Jiao, Lianmeng
AU - Pan, Quan
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Deep neural network has achieved remarkable success in fields such as image classification and object detection, sometimes even outperforming humans, but the black-box nature of deep neural networks limits their application in areas where the reasons for decisions need to be known. The increasing demand for more transparent and reliable models has led to the emergence of explainable machine learning, and more and more researchers have turned their attention to the interpretability of deep neural networks in an attempt to explore the inference process of the model by investigating the black-box properties of the network. Based on the different stages of explanation generation, we can broadly classify interpretable neural networks into two categories: post-hoc interpretable models and intrinsically interpretable models. In recent years, there have been numerous researches on interpretable neural networks, but nevertheless, there is still a lack of a unified classification and summary of the construction of intrinsically interpretable networks. In this paper, we review several typical approaches to building intrinsically interpretable neural networks in the field of image classification that have proposed in recent years, classify them according to the way they achieve interpretability, and summarize the strengths and weaknesses of each type of approach. Furthermore, we provide an outlook on future developments in this field.
AB - Deep neural network has achieved remarkable success in fields such as image classification and object detection, sometimes even outperforming humans, but the black-box nature of deep neural networks limits their application in areas where the reasons for decisions need to be known. The increasing demand for more transparent and reliable models has led to the emergence of explainable machine learning, and more and more researchers have turned their attention to the interpretability of deep neural networks in an attempt to explore the inference process of the model by investigating the black-box properties of the network. Based on the different stages of explanation generation, we can broadly classify interpretable neural networks into two categories: post-hoc interpretable models and intrinsically interpretable models. In recent years, there have been numerous researches on interpretable neural networks, but nevertheless, there is still a lack of a unified classification and summary of the construction of intrinsically interpretable networks. In this paper, we review several typical approaches to building intrinsically interpretable neural networks in the field of image classification that have proposed in recent years, classify them according to the way they achieve interpretability, and summarize the strengths and weaknesses of each type of approach. Furthermore, we provide an outlook on future developments in this field.
KW - explainable AI
KW - image classification
KW - Interpretable neural networks
UR - http://www.scopus.com/inward/record.url?scp=86000735559&partnerID=8YFLogxK
U2 - 10.1109/CAC63892.2024.10865081
DO - 10.1109/CAC63892.2024.10865081
M3 - 会议稿件
AN - SCOPUS:86000735559
T3 - Proceedings - 2024 China Automation Congress, CAC 2024
SP - 6835
EP - 6842
BT - Proceedings - 2024 China Automation Congress, CAC 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 November 2024 through 3 November 2024
ER -