AdaKnife: Flexible DNN Offloading for Inference Acceleration on Heterogeneous Mobile Devices

Sicong Liu; Hao Luo; Xiao Chen Li; Yao Li; Bin Guo; Zhiwen Yu; Yu Zhan Wang; Ke Ma; Ya San Ding; Yuan Yao

doi:10.1109/TMC.2024.3466931

AdaKnife: Flexible DNN Offloading for Inference Acceleration on Heterogeneous Mobile Devices

Sicong Liu, Hao Luo, Xiao Chen Li, Yao Li, Bin Guo, Zhiwen Yu, Yu Zhan Wang, Ke Ma, Ya San Ding, Yuan Yao

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

3 引用（Scopus）

摘要

The integration of deep neural network (DNN) intelligence into embedded mobile devices is expanding rapidly, supporting a wide range of applications. DNN compression techniques, which adapt models to resource-constrained mobile environments, often force a trade-off between efficiency and accuracy. Distributed DNN inference, leveraging multiple mobile devices, emerges as a promising alternative to enhance inference efficiency without compromising accuracy. However, effectively decoupling DNN models into fine-grained components for optimal parallel acceleration presents significant challenges. Current partitioning methods, including layer-level and operator or channel-level partitioning, provide only partial solutions and struggle with the heterogeneous nature of DNN compilation frameworks, complicating direct model offloading. In response, we introduce AdaKnife, an adaptive framework for accelerated inference across heterogeneous mobile devices. AdaKnife enables on-demand mixed-granularity DNN partitioning via computational graph analysis, facilitates efficient cross-framework model transitions with operator optimization for offloading, and improves the feasibility of parallel partitioning using a greedy operator parallelism algorithm. Our empirical studies show that AdaKnife achieves a 66.5% reduction in latency compared to baselines.

源语言	英语
页（从-至）	736-748
页数	13
期刊	IEEE Transactions on Mobile Computing
卷	24
期	2
DOI	https://doi.org/10.1109/TMC.2024.3466931
出版状态	已出版 - 2025

访问文件

10.1109/TMC.2024.3466931

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{206821efe58f4d15bc72e394c7cda082,

title = "AdaKnife: Flexible DNN Offloading for Inference Acceleration on Heterogeneous Mobile Devices",

abstract = "The integration of deep neural network (DNN) intelligence into embedded mobile devices is expanding rapidly, supporting a wide range of applications. DNN compression techniques, which adapt models to resource-constrained mobile environments, often force a trade-off between efficiency and accuracy. Distributed DNN inference, leveraging multiple mobile devices, emerges as a promising alternative to enhance inference efficiency without compromising accuracy. However, effectively decoupling DNN models into fine-grained components for optimal parallel acceleration presents significant challenges. Current partitioning methods, including layer-level and operator or channel-level partitioning, provide only partial solutions and struggle with the heterogeneous nature of DNN compilation frameworks, complicating direct model offloading. In response, we introduce AdaKnife, an adaptive framework for accelerated inference across heterogeneous mobile devices. AdaKnife enables on-demand mixed-granularity DNN partitioning via computational graph analysis, facilitates efficient cross-framework model transitions with operator optimization for offloading, and improves the feasibility of parallel partitioning using a greedy operator parallelism algorithm. Our empirical studies show that AdaKnife achieves a 66.5% reduction in latency compared to baselines.",

keywords = "DNN Offloading, DNN partition, heterogeneous mobile devices",

author = "Sicong Liu and Hao Luo and Li, {Xiao Chen} and Yao Li and Bin Guo and Zhiwen Yu and Wang, {Yu Zhan} and Ke Ma and Ding, {Ya San} and Yuan Yao",

note = "Publisher Copyright: {\textcopyright} 2002-2012 IEEE.",

year = "2025",

doi = "10.1109/TMC.2024.3466931",

language = "英语",

volume = "24",

pages = "736--748",

journal = "IEEE Transactions on Mobile Computing",

issn = "1536-1233",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "2",

}

TY - JOUR

T1 - AdaKnife

T2 - Flexible DNN Offloading for Inference Acceleration on Heterogeneous Mobile Devices

AU - Liu, Sicong

AU - Luo, Hao

AU - Li, Xiao Chen

AU - Li, Yao

AU - Guo, Bin

AU - Yu, Zhiwen

AU - Wang, Yu Zhan

AU - Ma, Ke

AU - Ding, Ya San

AU - Yao, Yuan

PY - 2025

Y1 - 2025

N2 - The integration of deep neural network (DNN) intelligence into embedded mobile devices is expanding rapidly, supporting a wide range of applications. DNN compression techniques, which adapt models to resource-constrained mobile environments, often force a trade-off between efficiency and accuracy. Distributed DNN inference, leveraging multiple mobile devices, emerges as a promising alternative to enhance inference efficiency without compromising accuracy. However, effectively decoupling DNN models into fine-grained components for optimal parallel acceleration presents significant challenges. Current partitioning methods, including layer-level and operator or channel-level partitioning, provide only partial solutions and struggle with the heterogeneous nature of DNN compilation frameworks, complicating direct model offloading. In response, we introduce AdaKnife, an adaptive framework for accelerated inference across heterogeneous mobile devices. AdaKnife enables on-demand mixed-granularity DNN partitioning via computational graph analysis, facilitates efficient cross-framework model transitions with operator optimization for offloading, and improves the feasibility of parallel partitioning using a greedy operator parallelism algorithm. Our empirical studies show that AdaKnife achieves a 66.5% reduction in latency compared to baselines.

AB - The integration of deep neural network (DNN) intelligence into embedded mobile devices is expanding rapidly, supporting a wide range of applications. DNN compression techniques, which adapt models to resource-constrained mobile environments, often force a trade-off between efficiency and accuracy. Distributed DNN inference, leveraging multiple mobile devices, emerges as a promising alternative to enhance inference efficiency without compromising accuracy. However, effectively decoupling DNN models into fine-grained components for optimal parallel acceleration presents significant challenges. Current partitioning methods, including layer-level and operator or channel-level partitioning, provide only partial solutions and struggle with the heterogeneous nature of DNN compilation frameworks, complicating direct model offloading. In response, we introduce AdaKnife, an adaptive framework for accelerated inference across heterogeneous mobile devices. AdaKnife enables on-demand mixed-granularity DNN partitioning via computational graph analysis, facilitates efficient cross-framework model transitions with operator optimization for offloading, and improves the feasibility of parallel partitioning using a greedy operator parallelism algorithm. Our empirical studies show that AdaKnife achieves a 66.5% reduction in latency compared to baselines.

KW - DNN Offloading

KW - DNN partition

KW - heterogeneous mobile devices

UR - http://www.scopus.com/inward/record.url?scp=85205795687&partnerID=8YFLogxK

U2 - 10.1109/TMC.2024.3466931

DO - 10.1109/TMC.2024.3466931

M3 - 文章

AN - SCOPUS:85205795687

SN - 1536-1233

VL - 24

SP - 736

EP - 748

JO - IEEE Transactions on Mobile Computing

JF - IEEE Transactions on Mobile Computing

IS - 2

ER -

AdaKnife: Flexible DNN Offloading for Inference Acceleration on Heterogeneous Mobile Devices

摘要

访问文件

其它文件与链接

指纹

引用此