Context-aware Adaptive Surgery: A Fast and E?ective Framework for Adaptative Model Partition

Hongli Wang; Bin Guo; Jiaqi Liu; Sicong Liu; Yungang Wu; Zhiwen Yu

doi:10.1145/3478073

Context-aware Adaptive Surgery: A Fast and E?ective Framework for Adaptative Model Partition

Hongli Wang, Bin Guo, Jiaqi Liu, Sicong Liu, Yungang Wu, Zhiwen Yu

计算机学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

21 引用（Scopus）

摘要

Deep Neural Networks (DNNs) have made massive progress in many fields and deploying DNNs on end devices has become an emerging trend to make intelligence closer to users. However, it is challenging to deploy large-scale and computation-intensive DNNs on resource-constrained end devices due to their small size and lightweight. To this end, model partition, which aims to partition DNNs into multiple parts to realize the collaborative computing of multiple devices, has received extensive research attention. To find the optimal partition, most existing approaches need to run from scratch under given resource constraints. However, they ignore that resources of devices (e.g., storage, battery power), and performance requirements (e.g., inference latency), are often continuously changing, making the optimal partition solution change constantly during processing. Therefore, it is very important to reduce the tuning latency of model partition to realize the real-time adaption under the changing processing context. To address these problems, we propose the Context-aware Adaptive Surgery (CAS) framework to actively perceive the changing processing context, and adaptively find the appropriate partition solution in real-time. Specifically, we construct the partition state graph to comprehensively model different partition solutions of DNNs by import context resources. Then "the neighbor effect"is proposed, which provides the heuristic rule for the search process. When the processing context changes, CAS adopts the runtime search algorithm, Graph-based Adaptive DNN Surgery (GADS), to quickly find the appropriate partition that satisfies resource constraints under the guidance of the neighbor effect. The experimental results show that CAS realizes adaptively rapid tuning of the model partition solutions in 10ms scale even for large DNNs (2.25x to 221.7x search time improvement than the state-of-the-art researches), and the total inference latency still keeps the same level with baselines.

源语言	英语
文章编号	3478073
期刊	Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
卷	5
期	3
DOI	https://doi.org/10.1145/3478073
出版状态	已出版 - 9月 2021

访问文件

10.1145/3478073

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{43af46df979d4c02a781505d5ebd8423,

title = "Context-aware Adaptive Surgery: A Fast and E?ective Framework for Adaptative Model Partition",

abstract = "Deep Neural Networks (DNNs) have made massive progress in many fields and deploying DNNs on end devices has become an emerging trend to make intelligence closer to users. However, it is challenging to deploy large-scale and computation-intensive DNNs on resource-constrained end devices due to their small size and lightweight. To this end, model partition, which aims to partition DNNs into multiple parts to realize the collaborative computing of multiple devices, has received extensive research attention. To find the optimal partition, most existing approaches need to run from scratch under given resource constraints. However, they ignore that resources of devices (e.g., storage, battery power), and performance requirements (e.g., inference latency), are often continuously changing, making the optimal partition solution change constantly during processing. Therefore, it is very important to reduce the tuning latency of model partition to realize the real-time adaption under the changing processing context. To address these problems, we propose the Context-aware Adaptive Surgery (CAS) framework to actively perceive the changing processing context, and adaptively find the appropriate partition solution in real-time. Specifically, we construct the partition state graph to comprehensively model different partition solutions of DNNs by import context resources. Then {"}the neighbor effect{"}is proposed, which provides the heuristic rule for the search process. When the processing context changes, CAS adopts the runtime search algorithm, Graph-based Adaptive DNN Surgery (GADS), to quickly find the appropriate partition that satisfies resource constraints under the guidance of the neighbor effect. The experimental results show that CAS realizes adaptively rapid tuning of the model partition solutions in 10ms scale even for large DNNs (2.25x to 221.7x search time improvement than the state-of-the-art researches), and the total inference latency still keeps the same level with baselines.",

keywords = "Adaptive model partition, Collaborative model computing, Context perception, Edge intelligence",

author = "Hongli Wang and Bin Guo and Jiaqi Liu and Sicong Liu and Yungang Wu and Zhiwen Yu",

note = "Publisher Copyright: {\textcopyright} 2021 ACM.",

year = "2021",

month = sep,

doi = "10.1145/3478073",

language = "英语",

volume = "5",

journal = "Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies",

issn = "2474-9567",

publisher = "Association for Computing Machinery (ACM)",

number = "3",

}

TY - JOUR

T1 - Context-aware Adaptive Surgery

T2 - A Fast and E?ective Framework for Adaptative Model Partition

AU - Wang, Hongli

AU - Guo, Bin

AU - Liu, Jiaqi

AU - Liu, Sicong

AU - Wu, Yungang

AU - Yu, Zhiwen

PY - 2021/9

Y1 - 2021/9

N2 - Deep Neural Networks (DNNs) have made massive progress in many fields and deploying DNNs on end devices has become an emerging trend to make intelligence closer to users. However, it is challenging to deploy large-scale and computation-intensive DNNs on resource-constrained end devices due to their small size and lightweight. To this end, model partition, which aims to partition DNNs into multiple parts to realize the collaborative computing of multiple devices, has received extensive research attention. To find the optimal partition, most existing approaches need to run from scratch under given resource constraints. However, they ignore that resources of devices (e.g., storage, battery power), and performance requirements (e.g., inference latency), are often continuously changing, making the optimal partition solution change constantly during processing. Therefore, it is very important to reduce the tuning latency of model partition to realize the real-time adaption under the changing processing context. To address these problems, we propose the Context-aware Adaptive Surgery (CAS) framework to actively perceive the changing processing context, and adaptively find the appropriate partition solution in real-time. Specifically, we construct the partition state graph to comprehensively model different partition solutions of DNNs by import context resources. Then "the neighbor effect"is proposed, which provides the heuristic rule for the search process. When the processing context changes, CAS adopts the runtime search algorithm, Graph-based Adaptive DNN Surgery (GADS), to quickly find the appropriate partition that satisfies resource constraints under the guidance of the neighbor effect. The experimental results show that CAS realizes adaptively rapid tuning of the model partition solutions in 10ms scale even for large DNNs (2.25x to 221.7x search time improvement than the state-of-the-art researches), and the total inference latency still keeps the same level with baselines.

AB - Deep Neural Networks (DNNs) have made massive progress in many fields and deploying DNNs on end devices has become an emerging trend to make intelligence closer to users. However, it is challenging to deploy large-scale and computation-intensive DNNs on resource-constrained end devices due to their small size and lightweight. To this end, model partition, which aims to partition DNNs into multiple parts to realize the collaborative computing of multiple devices, has received extensive research attention. To find the optimal partition, most existing approaches need to run from scratch under given resource constraints. However, they ignore that resources of devices (e.g., storage, battery power), and performance requirements (e.g., inference latency), are often continuously changing, making the optimal partition solution change constantly during processing. Therefore, it is very important to reduce the tuning latency of model partition to realize the real-time adaption under the changing processing context. To address these problems, we propose the Context-aware Adaptive Surgery (CAS) framework to actively perceive the changing processing context, and adaptively find the appropriate partition solution in real-time. Specifically, we construct the partition state graph to comprehensively model different partition solutions of DNNs by import context resources. Then "the neighbor effect"is proposed, which provides the heuristic rule for the search process. When the processing context changes, CAS adopts the runtime search algorithm, Graph-based Adaptive DNN Surgery (GADS), to quickly find the appropriate partition that satisfies resource constraints under the guidance of the neighbor effect. The experimental results show that CAS realizes adaptively rapid tuning of the model partition solutions in 10ms scale even for large DNNs (2.25x to 221.7x search time improvement than the state-of-the-art researches), and the total inference latency still keeps the same level with baselines.

KW - Adaptive model partition

KW - Collaborative model computing

KW - Context perception

KW - Edge intelligence

UR - http://www.scopus.com/inward/record.url?scp=85115179350&partnerID=8YFLogxK

U2 - 10.1145/3478073

DO - 10.1145/3478073

M3 - 文章

AN - SCOPUS:85115179350

SN - 2474-9567

VL - 5

JO - Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

JF - Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

IS - 3

M1 - 3478073

ER -

Context-aware Adaptive Surgery: A Fast and E?ective Framework for Adaptative Model Partition

摘要

访问文件

其它文件与链接

指纹

引用此