TY - JOUR
T1 - Gradient Matters
T2 - Designing Binarized Neural Networks via Enhanced Information-Flow
AU - Wang, Qi
AU - Guo, Nianhui
AU - Xiong, Zhitong
AU - Yin, Zeping
AU - Li, Xuelong
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2022/11/1
Y1 - 2022/11/1
N2 - Binarized neural networks (BNNs) have drawn significant attention in recent years, owing to great potential in reducing computation and storage consumption. While it is attractive, traditional BNNs usually suffer from slow convergence speed and dramatical accuracy-degradation on large-scale classification datasets. To minimize the gap between BNNs and deep neural networks (DNNs), we propose a new framework of designing BNNs, dubbed Hyper-BinaryNet, from the aspect of enhanced information-flow. Our contributions are threefold: 1) Considering the capacity-limitation in the backward pass, we propose an 1-bit convolution module named HyperConv. By exploiting the capacity of auxiliary neural networks, BNNs gain better performance on large-scale image classification task. 2) Considering the slow convergence speed in BNNs, we rethink the gradient accumulation mechanism and propose a hyper accumulation technique. By accumulating gradients in multiple variables rather than one as before, the gradient paths for each weight increase, which escapes BNNs from the gradient bottleneck problem during training. 3) Considering the ill-posed optimization problem, a novel gradient estimation warmup strategy, dubbed STE-Warmup, is developed. This strategy prevents BNNs from the unstable optimization process by progressively transferring neural networks from 32-bit to 1-bit. We conduct evaluations with variant architectures on three public datasets: CIFAR-10/100 and ImageNet. Compared with state-of-The-Art BNNs, Hyper-BinaryNet shows faster convergence speed and outperforms existing BNNs by a large margin.
AB - Binarized neural networks (BNNs) have drawn significant attention in recent years, owing to great potential in reducing computation and storage consumption. While it is attractive, traditional BNNs usually suffer from slow convergence speed and dramatical accuracy-degradation on large-scale classification datasets. To minimize the gap between BNNs and deep neural networks (DNNs), we propose a new framework of designing BNNs, dubbed Hyper-BinaryNet, from the aspect of enhanced information-flow. Our contributions are threefold: 1) Considering the capacity-limitation in the backward pass, we propose an 1-bit convolution module named HyperConv. By exploiting the capacity of auxiliary neural networks, BNNs gain better performance on large-scale image classification task. 2) Considering the slow convergence speed in BNNs, we rethink the gradient accumulation mechanism and propose a hyper accumulation technique. By accumulating gradients in multiple variables rather than one as before, the gradient paths for each weight increase, which escapes BNNs from the gradient bottleneck problem during training. 3) Considering the ill-posed optimization problem, a novel gradient estimation warmup strategy, dubbed STE-Warmup, is developed. This strategy prevents BNNs from the unstable optimization process by progressively transferring neural networks from 32-bit to 1-bit. We conduct evaluations with variant architectures on three public datasets: CIFAR-10/100 and ImageNet. Compared with state-of-The-Art BNNs, Hyper-BinaryNet shows faster convergence speed and outperforms existing BNNs by a large margin.
KW - 1-bit convolution
KW - gradient approximation
KW - Neural network accelerating
UR - https://www.scopus.com/pages/publications/85119578909
U2 - 10.1109/TPAMI.2021.3117908
DO - 10.1109/TPAMI.2021.3117908
M3 - 文章
C2 - 34613908
AN - SCOPUS:85119578909
SN - 0162-8828
VL - 44
SP - 7551
EP - 7562
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 11
ER -