Building Intrinsically Interpretable Deep Neural Networks: A Survey

Wenyan Zhang; Lianmeng Jiao; Quan Pan

doi:10.1109/CAC63892.2024.10865081

Building Intrinsically Interpretable Deep Neural Networks: A Survey

Wenyan Zhang, Lianmeng Jiao, Quan Pan

School of Automation

Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Deep neural network has achieved remarkable success in fields such as image classification and object detection, sometimes even outperforming humans, but the black-box nature of deep neural networks limits their application in areas where the reasons for decisions need to be known. The increasing demand for more transparent and reliable models has led to the emergence of explainable machine learning, and more and more researchers have turned their attention to the interpretability of deep neural networks in an attempt to explore the inference process of the model by investigating the black-box properties of the network. Based on the different stages of explanation generation, we can broadly classify interpretable neural networks into two categories: post-hoc interpretable models and intrinsically interpretable models. In recent years, there have been numerous researches on interpretable neural networks, but nevertheless, there is still a lack of a unified classification and summary of the construction of intrinsically interpretable networks. In this paper, we review several typical approaches to building intrinsically interpretable neural networks in the field of image classification that have proposed in recent years, classify them according to the way they achieve interpretability, and summarize the strengths and weaknesses of each type of approach. Furthermore, we provide an outlook on future developments in this field.

Original language	English
Title of host publication	Proceedings - 2024 China Automation Congress, CAC 2024
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	6835-6842
Number of pages	8
ISBN (Electronic)	9798350368604
DOIs	https://doi.org/10.1109/CAC63892.2024.10865081
State	Published - 2024
Event	2024 China Automation Congress, CAC 2024 - Qingdao, China Duration: 1 Nov 2024 → 3 Nov 2024

Publication series

Name	Proceedings - 2024 China Automation Congress, CAC 2024

Conference

Conference	2024 China Automation Congress, CAC 2024
Country/Territory	China
City	Qingdao
Period	1/11/24 → 3/11/24

Keywords

explainable AI
image classification
Interpretable neural networks

Access to Document

10.1109/CAC63892.2024.10865081

Cite this

@inproceedings{d67d04f9327e4a0f80c9613cbbc84bb9,

title = "Building Intrinsically Interpretable Deep Neural Networks: A Survey",

abstract = "Deep neural network has achieved remarkable success in fields such as image classification and object detection, sometimes even outperforming humans, but the black-box nature of deep neural networks limits their application in areas where the reasons for decisions need to be known. The increasing demand for more transparent and reliable models has led to the emergence of explainable machine learning, and more and more researchers have turned their attention to the interpretability of deep neural networks in an attempt to explore the inference process of the model by investigating the black-box properties of the network. Based on the different stages of explanation generation, we can broadly classify interpretable neural networks into two categories: post-hoc interpretable models and intrinsically interpretable models. In recent years, there have been numerous researches on interpretable neural networks, but nevertheless, there is still a lack of a unified classification and summary of the construction of intrinsically interpretable networks. In this paper, we review several typical approaches to building intrinsically interpretable neural networks in the field of image classification that have proposed in recent years, classify them according to the way they achieve interpretability, and summarize the strengths and weaknesses of each type of approach. Furthermore, we provide an outlook on future developments in this field.",

keywords = "explainable AI, image classification, Interpretable neural networks",

author = "Wenyan Zhang and Lianmeng Jiao and Quan Pan",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 China Automation Congress, CAC 2024 ; Conference date: 01-11-2024 Through 03-11-2024",

year = "2024",

doi = "10.1109/CAC63892.2024.10865081",

language = "英语",

series = "Proceedings - 2024 China Automation Congress, CAC 2024",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "6835--6842",

booktitle = "Proceedings - 2024 China Automation Congress, CAC 2024",

}

Zhang, W, Jiao, L & Pan, Q 2024, Building Intrinsically Interpretable Deep Neural Networks: A Survey. in Proceedings - 2024 China Automation Congress, CAC 2024. Proceedings - 2024 China Automation Congress, CAC 2024, Institute of Electrical and Electronics Engineers Inc., pp. 6835-6842, 2024 China Automation Congress, CAC 2024, Qingdao, China, 1/11/24. https://doi.org/10.1109/CAC63892.2024.10865081

Building Intrinsically Interpretable Deep Neural Networks: A Survey. / Zhang, Wenyan; Jiao, Lianmeng; Pan, Quan.
Proceedings - 2024 China Automation Congress, CAC 2024. Institute of Electrical and Electronics Engineers Inc., 2024. p. 6835-6842 (Proceedings - 2024 China Automation Congress, CAC 2024).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Building Intrinsically Interpretable Deep Neural Networks

T2 - 2024 China Automation Congress, CAC 2024

AU - Zhang, Wenyan

AU - Jiao, Lianmeng

AU - Pan, Quan

PY - 2024

Y1 - 2024

N2 - Deep neural network has achieved remarkable success in fields such as image classification and object detection, sometimes even outperforming humans, but the black-box nature of deep neural networks limits their application in areas where the reasons for decisions need to be known. The increasing demand for more transparent and reliable models has led to the emergence of explainable machine learning, and more and more researchers have turned their attention to the interpretability of deep neural networks in an attempt to explore the inference process of the model by investigating the black-box properties of the network. Based on the different stages of explanation generation, we can broadly classify interpretable neural networks into two categories: post-hoc interpretable models and intrinsically interpretable models. In recent years, there have been numerous researches on interpretable neural networks, but nevertheless, there is still a lack of a unified classification and summary of the construction of intrinsically interpretable networks. In this paper, we review several typical approaches to building intrinsically interpretable neural networks in the field of image classification that have proposed in recent years, classify them according to the way they achieve interpretability, and summarize the strengths and weaknesses of each type of approach. Furthermore, we provide an outlook on future developments in this field.

AB - Deep neural network has achieved remarkable success in fields such as image classification and object detection, sometimes even outperforming humans, but the black-box nature of deep neural networks limits their application in areas where the reasons for decisions need to be known. The increasing demand for more transparent and reliable models has led to the emergence of explainable machine learning, and more and more researchers have turned their attention to the interpretability of deep neural networks in an attempt to explore the inference process of the model by investigating the black-box properties of the network. Based on the different stages of explanation generation, we can broadly classify interpretable neural networks into two categories: post-hoc interpretable models and intrinsically interpretable models. In recent years, there have been numerous researches on interpretable neural networks, but nevertheless, there is still a lack of a unified classification and summary of the construction of intrinsically interpretable networks. In this paper, we review several typical approaches to building intrinsically interpretable neural networks in the field of image classification that have proposed in recent years, classify them according to the way they achieve interpretability, and summarize the strengths and weaknesses of each type of approach. Furthermore, we provide an outlook on future developments in this field.

KW - explainable AI

KW - image classification

KW - Interpretable neural networks

UR - http://www.scopus.com/inward/record.url?scp=86000735559&partnerID=8YFLogxK

U2 - 10.1109/CAC63892.2024.10865081

DO - 10.1109/CAC63892.2024.10865081

M3 - 会议稿件

AN - SCOPUS:86000735559

T3 - Proceedings - 2024 China Automation Congress, CAC 2024

SP - 6835

EP - 6842

BT - Proceedings - 2024 China Automation Congress, CAC 2024

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 1 November 2024 through 3 November 2024

ER -

Building Intrinsically Interpretable Deep Neural Networks: A Survey

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this