TY - JOUR
T1 - 神经网络轻量化综述
AU - Duan, Yuchen
AU - Fang, Zhenyu
AU - Zheng, Jiangbin
N1 - Publisher Copyright:
© 2025 Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press. All rights reserved.
PY - 2025/4/1
Y1 - 2025/4/1
N2 - With the continuous progress of deep learning technology, artificial neural network models have shown unprecedented performance in many fields such as image recognition, natural language processing, and autonomous driving. These models often have millions or even billions of parameters and learn complex feature representations through large amounts of training data. However, in resource-constrained environments, such as mobile devices, embedded systems and other edge computing scenarios, the power consumption, memory usage and computing efficiency of the model limit the application of large-scale neural network models. To solve this problem, the researchers have proposed a variety of model compression techniques, such as pruning, distillation, neural network search (NAS), quantization, and low-rank decomposition, which aim to reduce the number of parameters, computational complexity, and storage requirements of the model, while maintaining the accuracy of the model as much as possible. The following is a systematic introduction to the development process of these model compression methods, focusing on the main principles and key technologies of each method. It mainly includes different strategies of pruning techniques, such as structured pruning and unstructured pruning; how to define knowledge in knowledge distillation; search space, search algorithm and network performance evaluation in NAS; post-training quantization and in-training quantization in quantization; and the singular value decomposition and tensor decomposition in low rank decomposition. Finally, the future development direction of model compression technology is discussed.
AB - With the continuous progress of deep learning technology, artificial neural network models have shown unprecedented performance in many fields such as image recognition, natural language processing, and autonomous driving. These models often have millions or even billions of parameters and learn complex feature representations through large amounts of training data. However, in resource-constrained environments, such as mobile devices, embedded systems and other edge computing scenarios, the power consumption, memory usage and computing efficiency of the model limit the application of large-scale neural network models. To solve this problem, the researchers have proposed a variety of model compression techniques, such as pruning, distillation, neural network search (NAS), quantization, and low-rank decomposition, which aim to reduce the number of parameters, computational complexity, and storage requirements of the model, while maintaining the accuracy of the model as much as possible. The following is a systematic introduction to the development process of these model compression methods, focusing on the main principles and key technologies of each method. It mainly includes different strategies of pruning techniques, such as structured pruning and unstructured pruning; how to define knowledge in knowledge distillation; search space, search algorithm and network performance evaluation in NAS; post-training quantization and in-training quantization in quantization; and the singular value decomposition and tensor decomposition in low rank decomposition. Finally, the future development direction of model compression technology is discussed.
KW - knowledge distillation
KW - low-rank decomposition
KW - neural network search (NAS)
KW - pruning
KW - quantization
UR - http://www.scopus.com/inward/record.url?scp=105001315033&partnerID=8YFLogxK
U2 - 10.3778/j.issn.1673-9418.2403071
DO - 10.3778/j.issn.1673-9418.2403071
M3 - 文献综述
AN - SCOPUS:105001315033
SN - 1673-9418
VL - 19
SP - 835
EP - 853
JO - Journal of Frontiers of Computer Science and Technology
JF - Journal of Frontiers of Computer Science and Technology
IS - 4
ER -