Workload-Aware Performance Model Based Soft Preemptive Real-Time Scheduling for Neural Processing Units

Yuan Yao; Yujiao Hu; Yi Dang; Wei Tao; Kai Hu; Qiming Huang; Zhe Peng; Gang Yang; Xingshe Zhou

doi:10.1109/TPDS.2025.3553922

Workload-Aware Performance Model Based Soft Preemptive Real-Time Scheduling for Neural Processing Units

Yuan Yao, Yujiao Hu, Yi Dang, Wei Tao, Kai Hu, Qiming Huang, Zhe Peng, Gang Yang, Xingshe Zhou

School of Computer Science

Research output: Contribution to journal › Article › peer-review

Abstract

A neural processing unit (NPU) is a microprocessor which is specially designed for various types of neural network applications. Because of its high acceleration efficiency and lower power consumption, the airborne embedded system has widely deployed NPU to replace GPU as the new accelerator. Unfortunately, the inherent scheduler of NPU does not consider real-time scheduling. Therefore, it cannot meet real-time requirements of airborne embedded systems. At present, there is less research on the multi-task real-time scheduling of the NPU device. In this article, we first design an NPU resource management framework based on Kubernetes. Then, we propose WAMSPRES, a workload-aware NPU performance model based soft preemptive real-time scheduling method. The proposed workload-aware NPU performance model can accurately predict the remaining execution time of the task when it runs with other tasks concurrently. The soft preemptive real-time scheduling algorithm can provide approximate preemption capability by dynamically adjusting the NPU computing resources of tasks. Finally, we implement a prototype NPU scheduler of the airborne embedded system for the fixed-wing UAV. The proposed models and algorithms are validated on both the simulated and realistic task sets. Experimental results illustrate that WAMSPRES can achieve low prediction error and high scheduling success rate.

Original language	English
Pages (from-to)	1058-1070
Number of pages	13
Journal	IEEE Transactions on Parallel and Distributed Systems
Volume	36
Issue number	6
DOIs	https://doi.org/10.1109/TPDS.2025.3553922
State	Published - 2025

Keywords

computing power
dynamic-quota
Embedded system
NPU performance model
real-time scheduling
soft preemptive scheduling

Access to Document

10.1109/TPDS.2025.3553922

Cite this

@article{9ab7e32e3fb84c41b0a014ac928f56a3,

title = "Workload-Aware Performance Model Based Soft Preemptive Real-Time Scheduling for Neural Processing Units",

abstract = "A neural processing unit (NPU) is a microprocessor which is specially designed for various types of neural network applications. Because of its high acceleration efficiency and lower power consumption, the airborne embedded system has widely deployed NPU to replace GPU as the new accelerator. Unfortunately, the inherent scheduler of NPU does not consider real-time scheduling. Therefore, it cannot meet real-time requirements of airborne embedded systems. At present, there is less research on the multi-task real-time scheduling of the NPU device. In this article, we first design an NPU resource management framework based on Kubernetes. Then, we propose WAMSPRES, a workload-aware NPU performance model based soft preemptive real-time scheduling method. The proposed workload-aware NPU performance model can accurately predict the remaining execution time of the task when it runs with other tasks concurrently. The soft preemptive real-time scheduling algorithm can provide approximate preemption capability by dynamically adjusting the NPU computing resources of tasks. Finally, we implement a prototype NPU scheduler of the airborne embedded system for the fixed-wing UAV. The proposed models and algorithms are validated on both the simulated and realistic task sets. Experimental results illustrate that WAMSPRES can achieve low prediction error and high scheduling success rate.",

keywords = "computing power, dynamic-quota, Embedded system, NPU performance model, real-time scheduling, soft preemptive scheduling",

author = "Yuan Yao and Yujiao Hu and Yi Dang and Wei Tao and Kai Hu and Qiming Huang and Zhe Peng and Gang Yang and Xingshe Zhou",

note = "Publisher Copyright: {\textcopyright} 1990-2012 IEEE.",

year = "2025",

doi = "10.1109/TPDS.2025.3553922",

language = "英语",

volume = "36",

pages = "1058--1070",

journal = "IEEE Transactions on Parallel and Distributed Systems",

issn = "1045-9219",

publisher = "IEEE Computer Society",

number = "6",

}

TY - JOUR

T1 - Workload-Aware Performance Model Based Soft Preemptive Real-Time Scheduling for Neural Processing Units

AU - Yao, Yuan

AU - Hu, Yujiao

AU - Dang, Yi

AU - Tao, Wei

AU - Hu, Kai

AU - Huang, Qiming

AU - Peng, Zhe

AU - Yang, Gang

AU - Zhou, Xingshe

PY - 2025

Y1 - 2025

N2 - A neural processing unit (NPU) is a microprocessor which is specially designed for various types of neural network applications. Because of its high acceleration efficiency and lower power consumption, the airborne embedded system has widely deployed NPU to replace GPU as the new accelerator. Unfortunately, the inherent scheduler of NPU does not consider real-time scheduling. Therefore, it cannot meet real-time requirements of airborne embedded systems. At present, there is less research on the multi-task real-time scheduling of the NPU device. In this article, we first design an NPU resource management framework based on Kubernetes. Then, we propose WAMSPRES, a workload-aware NPU performance model based soft preemptive real-time scheduling method. The proposed workload-aware NPU performance model can accurately predict the remaining execution time of the task when it runs with other tasks concurrently. The soft preemptive real-time scheduling algorithm can provide approximate preemption capability by dynamically adjusting the NPU computing resources of tasks. Finally, we implement a prototype NPU scheduler of the airborne embedded system for the fixed-wing UAV. The proposed models and algorithms are validated on both the simulated and realistic task sets. Experimental results illustrate that WAMSPRES can achieve low prediction error and high scheduling success rate.

AB - A neural processing unit (NPU) is a microprocessor which is specially designed for various types of neural network applications. Because of its high acceleration efficiency and lower power consumption, the airborne embedded system has widely deployed NPU to replace GPU as the new accelerator. Unfortunately, the inherent scheduler of NPU does not consider real-time scheduling. Therefore, it cannot meet real-time requirements of airborne embedded systems. At present, there is less research on the multi-task real-time scheduling of the NPU device. In this article, we first design an NPU resource management framework based on Kubernetes. Then, we propose WAMSPRES, a workload-aware NPU performance model based soft preemptive real-time scheduling method. The proposed workload-aware NPU performance model can accurately predict the remaining execution time of the task when it runs with other tasks concurrently. The soft preemptive real-time scheduling algorithm can provide approximate preemption capability by dynamically adjusting the NPU computing resources of tasks. Finally, we implement a prototype NPU scheduler of the airborne embedded system for the fixed-wing UAV. The proposed models and algorithms are validated on both the simulated and realistic task sets. Experimental results illustrate that WAMSPRES can achieve low prediction error and high scheduling success rate.

KW - computing power

KW - dynamic-quota

KW - Embedded system

KW - NPU performance model

KW - real-time scheduling

KW - soft preemptive scheduling

UR - http://www.scopus.com/inward/record.url?scp=105003169924&partnerID=8YFLogxK

U2 - 10.1109/TPDS.2025.3553922

DO - 10.1109/TPDS.2025.3553922

M3 - 文章

AN - SCOPUS:105003169924

SN - 1045-9219

VL - 36

SP - 1058

EP - 1070

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

IS - 6

ER -

Workload-Aware Performance Model Based Soft Preemptive Real-Time Scheduling for Neural Processing Units

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this