神经网络轻量化综述

Yuchen Duan; Zhenyu Fang; Jiangbin Zheng

doi:10.3778/j.issn.1673-9418.2403071

神经网络轻量化综述

Yuchen Duan, Zhenyu Fang, Jiangbin Zheng

软件学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文献综述 › 同行评审

摘要

With the continuous progress of deep learning technology, artificial neural network models have shown unprecedented performance in many fields such as image recognition, natural language processing, and autonomous driving. These models often have millions or even billions of parameters and learn complex feature representations through large amounts of training data. However, in resource-constrained environments, such as mobile devices, embedded systems and other edge computing scenarios, the power consumption, memory usage and computing efficiency of the model limit the application of large-scale neural network models. To solve this problem, the researchers have proposed a variety of model compression techniques, such as pruning, distillation, neural network search (NAS), quantization, and low-rank decomposition, which aim to reduce the number of parameters, computational complexity, and storage requirements of the model, while maintaining the accuracy of the model as much as possible. The following is a systematic introduction to the development process of these model compression methods, focusing on the main principles and key technologies of each method. It mainly includes different strategies of pruning techniques, such as structured pruning and unstructured pruning; how to define knowledge in knowledge distillation; search space, search algorithm and network performance evaluation in NAS; post-training quantization and in-training quantization in quantization; and the singular value decomposition and tensor decomposition in low rank decomposition. Finally, the future development direction of model compression technology is discussed.

投稿的翻译标题	Review of Neural Network Lightweight
源语言	繁体中文
页（从-至）	835-853
页数	19
期刊	Journal of Frontiers of Computer Science and Technology
卷	19
期	4
DOI	https://doi.org/10.3778/j.issn.1673-9418.2403071
出版状态	已出版 - 1 4月 2025

关键词

knowledge distillation
low-rank decomposition
neural network search (NAS)
pruning
quantization

访问文件

10.3778/j.issn.1673-9418.2403071

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{4c5801c1b42a47adbf5fa9aaf7442feb,

title = "神经网络轻量化综述",

abstract = "With the continuous progress of deep learning technology, artificial neural network models have shown unprecedented performance in many fields such as image recognition, natural language processing, and autonomous driving. These models often have millions or even billions of parameters and learn complex feature representations through large amounts of training data. However, in resource-constrained environments, such as mobile devices, embedded systems and other edge computing scenarios, the power consumption, memory usage and computing efficiency of the model limit the application of large-scale neural network models. To solve this problem, the researchers have proposed a variety of model compression techniques, such as pruning, distillation, neural network search (NAS), quantization, and low-rank decomposition, which aim to reduce the number of parameters, computational complexity, and storage requirements of the model, while maintaining the accuracy of the model as much as possible. The following is a systematic introduction to the development process of these model compression methods, focusing on the main principles and key technologies of each method. It mainly includes different strategies of pruning techniques, such as structured pruning and unstructured pruning; how to define knowledge in knowledge distillation; search space, search algorithm and network performance evaluation in NAS; post-training quantization and in-training quantization in quantization; and the singular value decomposition and tensor decomposition in low rank decomposition. Finally, the future development direction of model compression technology is discussed.",

keywords = "knowledge distillation, low-rank decomposition, neural network search (NAS), pruning, quantization",

author = "Yuchen Duan and Zhenyu Fang and Jiangbin Zheng",

year = "2025",

month = apr,

day = "1",

doi = "10.3778/j.issn.1673-9418.2403071",

language = "繁体中文",

volume = "19",

pages = "835--853",

journal = "Journal of Frontiers of Computer Science and Technology",

issn = "1673-9418",

publisher = "Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press",

number = "4",

}

TY - JOUR

T1 - 神经网络轻量化综述

AU - Duan, Yuchen

AU - Fang, Zhenyu

AU - Zheng, Jiangbin

PY - 2025/4/1

Y1 - 2025/4/1

N2 - With the continuous progress of deep learning technology, artificial neural network models have shown unprecedented performance in many fields such as image recognition, natural language processing, and autonomous driving. These models often have millions or even billions of parameters and learn complex feature representations through large amounts of training data. However, in resource-constrained environments, such as mobile devices, embedded systems and other edge computing scenarios, the power consumption, memory usage and computing efficiency of the model limit the application of large-scale neural network models. To solve this problem, the researchers have proposed a variety of model compression techniques, such as pruning, distillation, neural network search (NAS), quantization, and low-rank decomposition, which aim to reduce the number of parameters, computational complexity, and storage requirements of the model, while maintaining the accuracy of the model as much as possible. The following is a systematic introduction to the development process of these model compression methods, focusing on the main principles and key technologies of each method. It mainly includes different strategies of pruning techniques, such as structured pruning and unstructured pruning; how to define knowledge in knowledge distillation; search space, search algorithm and network performance evaluation in NAS; post-training quantization and in-training quantization in quantization; and the singular value decomposition and tensor decomposition in low rank decomposition. Finally, the future development direction of model compression technology is discussed.

AB - With the continuous progress of deep learning technology, artificial neural network models have shown unprecedented performance in many fields such as image recognition, natural language processing, and autonomous driving. These models often have millions or even billions of parameters and learn complex feature representations through large amounts of training data. However, in resource-constrained environments, such as mobile devices, embedded systems and other edge computing scenarios, the power consumption, memory usage and computing efficiency of the model limit the application of large-scale neural network models. To solve this problem, the researchers have proposed a variety of model compression techniques, such as pruning, distillation, neural network search (NAS), quantization, and low-rank decomposition, which aim to reduce the number of parameters, computational complexity, and storage requirements of the model, while maintaining the accuracy of the model as much as possible. The following is a systematic introduction to the development process of these model compression methods, focusing on the main principles and key technologies of each method. It mainly includes different strategies of pruning techniques, such as structured pruning and unstructured pruning; how to define knowledge in knowledge distillation; search space, search algorithm and network performance evaluation in NAS; post-training quantization and in-training quantization in quantization; and the singular value decomposition and tensor decomposition in low rank decomposition. Finally, the future development direction of model compression technology is discussed.

KW - knowledge distillation

KW - low-rank decomposition

KW - neural network search (NAS)

KW - pruning

KW - quantization

UR - http://www.scopus.com/inward/record.url?scp=105001315033&partnerID=8YFLogxK

U2 - 10.3778/j.issn.1673-9418.2403071

DO - 10.3778/j.issn.1673-9418.2403071

M3 - 文献综述

AN - SCOPUS:105001315033

SN - 1673-9418

VL - 19

SP - 835

EP - 853

JO - Journal of Frontiers of Computer Science and Technology

JF - Journal of Frontiers of Computer Science and Technology

IS - 4

ER -

神经网络轻量化综述

摘要

关键词

访问文件

其它文件与链接

指纹

引用此