TY - JOUR
T1 - HCDCMQ
T2 - Hessian-aware Channel Determinism-decomposition With Counterfactual Multi-agent Optimization For Channel-wise Mixed-precision Post-training Quantization
AU - Xu, Wentao
AU - Wang, Jiaxiang
AU - Wang, Ruize
AU - Li, Xiang
AU - Zhang, Yuxuan
AU - Wu, Xiang
AU - Li, Weilin
AU - Qu, Fengdong
N1 - Publisher Copyright:
© 2026 Elsevier B.V.
PY - 2026/5/28
Y1 - 2026/5/28
N2 - Post training quantization (PTQ) enables efficient deployment of deep neural networks on hardware with limited resources. Channel-wise mixed-precision quantization often yields a superior balance between accuracy and efficiency compared with uniform-bit PTQ. However, this approach encounters major obstacles including an exponentially large configuration space, unstable sensitivity evaluation caused by inter channel coupling, and ambiguous credit attribution during fine grained exploration. A channel-wise mixed precision post training quantization framework named HCDCMQ is introduced, combining Hessian aware Channel Determinism Decomposition (HCDD) with a counterfactual multi agent (COMA) policy gradients method. HCDD constructs a unified measure of channel sensitivity and fixes deterministic channels, thereby shrinking the practical search domain. Channels that remain ambiguous are grouped within a responsibility space to generate a compact discrete action set. With joint optimization targets covering accuracy, model size, and latency, the counterfactual baseline estimates the marginal impact of each agent, which lowers reward variance and improves the stability of policy optimization. Evaluations on ImageNet demonstrate that HCDCMQ achieves superior accuracy and efficiency outcomes relative to uniform bit post training quantization on ResNet 18, ResNet 50, MobileNetV2, InceptionV3, RegNetX 600 M, and RegNetX 3.2 G. On ResNet 50, HCDCMQ attains 75.775% top-1 accuracy while preserving only 13.4% of the FP32 model size and 4.4% of the total bit-operations.
AB - Post training quantization (PTQ) enables efficient deployment of deep neural networks on hardware with limited resources. Channel-wise mixed-precision quantization often yields a superior balance between accuracy and efficiency compared with uniform-bit PTQ. However, this approach encounters major obstacles including an exponentially large configuration space, unstable sensitivity evaluation caused by inter channel coupling, and ambiguous credit attribution during fine grained exploration. A channel-wise mixed precision post training quantization framework named HCDCMQ is introduced, combining Hessian aware Channel Determinism Decomposition (HCDD) with a counterfactual multi agent (COMA) policy gradients method. HCDD constructs a unified measure of channel sensitivity and fixes deterministic channels, thereby shrinking the practical search domain. Channels that remain ambiguous are grouped within a responsibility space to generate a compact discrete action set. With joint optimization targets covering accuracy, model size, and latency, the counterfactual baseline estimates the marginal impact of each agent, which lowers reward variance and improves the stability of policy optimization. Evaluations on ImageNet demonstrate that HCDCMQ achieves superior accuracy and efficiency outcomes relative to uniform bit post training quantization on ResNet 18, ResNet 50, MobileNetV2, InceptionV3, RegNetX 600 M, and RegNetX 3.2 G. On ResNet 50, HCDCMQ attains 75.775% top-1 accuracy while preserving only 13.4% of the FP32 model size and 4.4% of the total bit-operations.
KW - Channel-wise quantization
KW - Hessian-aware sensitivity
KW - Mixed-precision quantization
KW - Model compression
KW - Multi-agent reinforcement learning
KW - Post-training quantization
UR - https://www.scopus.com/pages/publications/105032088643
U2 - 10.1016/j.neucom.2026.133191
DO - 10.1016/j.neucom.2026.133191
M3 - 文章
AN - SCOPUS:105032088643
SN - 0925-2312
VL - 679
JO - Neurocomputing
JF - Neurocomputing
M1 - 133191
ER -