TY - JOUR
T1 - AdaKnife
T2 - Flexible DNN Offloading for Inference Acceleration on Heterogeneous Mobile Devices
AU - Liu, Sicong
AU - Luo, Hao
AU - Li, Xiao Chen
AU - Li, Yao
AU - Guo, Bin
AU - Yu, Zhiwen
AU - Wang, Yu Zhan
AU - Ma, Ke
AU - Ding, Ya San
AU - Yao, Yuan
N1 - Publisher Copyright:
© 2002-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - The integration of deep neural network (DNN) intelligence into embedded mobile devices is expanding rapidly, supporting a wide range of applications. DNN compression techniques, which adapt models to resource-constrained mobile environments, often force a trade-off between efficiency and accuracy. Distributed DNN inference, leveraging multiple mobile devices, emerges as a promising alternative to enhance inference efficiency without compromising accuracy. However, effectively decoupling DNN models into fine-grained components for optimal parallel acceleration presents significant challenges. Current partitioning methods, including layer-level and operator or channel-level partitioning, provide only partial solutions and struggle with the heterogeneous nature of DNN compilation frameworks, complicating direct model offloading. In response, we introduce AdaKnife, an adaptive framework for accelerated inference across heterogeneous mobile devices. AdaKnife enables on-demand mixed-granularity DNN partitioning via computational graph analysis, facilitates efficient cross-framework model transitions with operator optimization for offloading, and improves the feasibility of parallel partitioning using a greedy operator parallelism algorithm. Our empirical studies show that AdaKnife achieves a 66.5% reduction in latency compared to baselines.
AB - The integration of deep neural network (DNN) intelligence into embedded mobile devices is expanding rapidly, supporting a wide range of applications. DNN compression techniques, which adapt models to resource-constrained mobile environments, often force a trade-off between efficiency and accuracy. Distributed DNN inference, leveraging multiple mobile devices, emerges as a promising alternative to enhance inference efficiency without compromising accuracy. However, effectively decoupling DNN models into fine-grained components for optimal parallel acceleration presents significant challenges. Current partitioning methods, including layer-level and operator or channel-level partitioning, provide only partial solutions and struggle with the heterogeneous nature of DNN compilation frameworks, complicating direct model offloading. In response, we introduce AdaKnife, an adaptive framework for accelerated inference across heterogeneous mobile devices. AdaKnife enables on-demand mixed-granularity DNN partitioning via computational graph analysis, facilitates efficient cross-framework model transitions with operator optimization for offloading, and improves the feasibility of parallel partitioning using a greedy operator parallelism algorithm. Our empirical studies show that AdaKnife achieves a 66.5% reduction in latency compared to baselines.
KW - DNN Offloading
KW - DNN partition
KW - heterogeneous mobile devices
UR - http://www.scopus.com/inward/record.url?scp=85205795687&partnerID=8YFLogxK
U2 - 10.1109/TMC.2024.3466931
DO - 10.1109/TMC.2024.3466931
M3 - 文章
AN - SCOPUS:85205795687
SN - 1536-1233
VL - 24
SP - 736
EP - 748
JO - IEEE Transactions on Mobile Computing
JF - IEEE Transactions on Mobile Computing
IS - 2
ER -