TY - JOUR
T1 - Workload-Aware Performance Model Based Soft Preemptive Real-Time Scheduling for Neural Processing Units
AU - Yao, Yuan
AU - Hu, Yujiao
AU - Dang, Yi
AU - Tao, Wei
AU - Hu, Kai
AU - Huang, Qiming
AU - Peng, Zhe
AU - Yang, Gang
AU - Zhou, Xingshe
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - A neural processing unit (NPU) is a microprocessor which is specially designed for various types of neural network applications. Because of its high acceleration efficiency and lower power consumption, the airborne embedded system has widely deployed NPU to replace GPU as the new accelerator. Unfortunately, the inherent scheduler of NPU does not consider real-time scheduling. Therefore, it cannot meet real-time requirements of airborne embedded systems. At present, there is less research on the multi-task real-time scheduling of the NPU device. In this paper, we first design an NPU resource management framework based on Kubernetes. Then, we propose WAMSPRES, a workload-aware NPU performance model based soft preemptive real-time scheduling method. The proposed workload-aware NPU performance model can accurately predict the remaining execution time of the task when it runs with other tasks concurrently. The soft preemptive real-time scheduling algorithm can provide approximate preemption capability by dynamically adjusting the NPU computing resources of tasks. Finally, we implement a prototype NPU scheduler of the airborne embedded system for the fixed-wing UAV. The proposed models and algorithms are validated on both the simulated and realistic task sets. Experimental results illustrate that WAMSPRES can achieve low prediction error and high scheduling success rate.
AB - A neural processing unit (NPU) is a microprocessor which is specially designed for various types of neural network applications. Because of its high acceleration efficiency and lower power consumption, the airborne embedded system has widely deployed NPU to replace GPU as the new accelerator. Unfortunately, the inherent scheduler of NPU does not consider real-time scheduling. Therefore, it cannot meet real-time requirements of airborne embedded systems. At present, there is less research on the multi-task real-time scheduling of the NPU device. In this paper, we first design an NPU resource management framework based on Kubernetes. Then, we propose WAMSPRES, a workload-aware NPU performance model based soft preemptive real-time scheduling method. The proposed workload-aware NPU performance model can accurately predict the remaining execution time of the task when it runs with other tasks concurrently. The soft preemptive real-time scheduling algorithm can provide approximate preemption capability by dynamically adjusting the NPU computing resources of tasks. Finally, we implement a prototype NPU scheduler of the airborne embedded system for the fixed-wing UAV. The proposed models and algorithms are validated on both the simulated and realistic task sets. Experimental results illustrate that WAMSPRES can achieve low prediction error and high scheduling success rate.
KW - computing power
KW - dynamic-quota
KW - embedded system
KW - NPU performance model
KW - real-time scheduling
KW - soft preemptive scheduling
UR - http://www.scopus.com/inward/record.url?scp=105001527715&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2025.3553922
DO - 10.1109/TPDS.2025.3553922
M3 - 文章
AN - SCOPUS:105001527715
SN - 1045-9219
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
ER -