TY - JOUR
T1 - HFOD
T2 - A hardware-friendly quantization method for object detection on embedded FPGAs
AU - Zhang, Fei
AU - Gao, Ziyang
AU - Huang, Jiaming
AU - Zhen, Peining
AU - Chen, Hai Bao
AU - Yan, Jie
N1 - Publisher Copyright:
© 2022 The Institute of Electronics, Information and Communication Engineers
PY - 2022/4/25
Y1 - 2022/4/25
N2 - There are two research hotspots for improving performance and energy efficiency of the inference phase of Convolutional neural networks (CNNs). The first one is model compression techniques while the second is hardware accelerator implementation. To overcome the incompatibility of algorithm optimization and hardware design, this paper proposes HFOD, a hardware-friendly quantization method for object detection on embedded FPGAs. We adopt a channel-wise, uniform quantization method to compress YOLOv3-Tiny model. Weights are quantized to 2-bit while activations are quantized to 8-bit for all convolutional layers. To achieve highly-efficient implementations on FPGA, we add batch normalization (BN) layer fusion in quantization process. A flexible, efficient convolutional unit structure is designed to utilize hardware-friendly quantization, and the accelerator is developed based on an automatic synthesis template. Experimental results show that the resources of FPGA in the proposed accelerator design contribute more computing performance compared with regular 8-bit/16-bit fixed point quantization. The model size and the activation size of the proposed network with 2-bit weights and 8-bit activations can be effectively reduced by 16× and 4× with a small amount of accuracy loss, respectively. Our HFOD method can achieve 90.6 GOPS on PYNQZ2 at 150 MHz, which is 1.4× faster and 2× better in power efficiency than peer FPGA implementation on the same platform.
AB - There are two research hotspots for improving performance and energy efficiency of the inference phase of Convolutional neural networks (CNNs). The first one is model compression techniques while the second is hardware accelerator implementation. To overcome the incompatibility of algorithm optimization and hardware design, this paper proposes HFOD, a hardware-friendly quantization method for object detection on embedded FPGAs. We adopt a channel-wise, uniform quantization method to compress YOLOv3-Tiny model. Weights are quantized to 2-bit while activations are quantized to 8-bit for all convolutional layers. To achieve highly-efficient implementations on FPGA, we add batch normalization (BN) layer fusion in quantization process. A flexible, efficient convolutional unit structure is designed to utilize hardware-friendly quantization, and the accelerator is developed based on an automatic synthesis template. Experimental results show that the resources of FPGA in the proposed accelerator design contribute more computing performance compared with regular 8-bit/16-bit fixed point quantization. The model size and the activation size of the proposed network with 2-bit weights and 8-bit activations can be effectively reduced by 16× and 4× with a small amount of accuracy loss, respectively. Our HFOD method can achieve 90.6 GOPS on PYNQZ2 at 150 MHz, which is 1.4× faster and 2× better in power efficiency than peer FPGA implementation on the same platform.
KW - convolutional neural networks
KW - highly-efficient implementation
KW - quantization
UR - http://www.scopus.com/inward/record.url?scp=85130715965&partnerID=8YFLogxK
U2 - 10.1587/elex.19.20220067
DO - 10.1587/elex.19.20220067
M3 - 文章
AN - SCOPUS:85130715965
SN - 1349-2543
VL - 19
JO - IEICE Electronics Express
JF - IEICE Electronics Express
IS - 8
ER -