Workload-Aware Performance Model Based Soft Preemptive Real-Time Scheduling for Neural Processing Units

Yuan Yao, Yujiao Hu, Yi Dang, Wei Tao, Kai Hu, Qiming Huang, Zhe Peng, Gang Yang, Xingshe Zhou

Research output: Contribution to journalArticlepeer-review

Abstract

A neural processing unit (NPU) is a microprocessor which is specially designed for various types of neural network applications. Because of its high acceleration efficiency and lower power consumption, the airborne embedded system has widely deployed NPU to replace GPU as the new accelerator. Unfortunately, the inherent scheduler of NPU does not consider real-time scheduling. Therefore, it cannot meet real-time requirements of airborne embedded systems. At present, there is less research on the multi-task real-time scheduling of the NPU device. In this article, we first design an NPU resource management framework based on Kubernetes. Then, we propose WAMSPRES, a workload-aware NPU performance model based soft preemptive real-time scheduling method. The proposed workload-aware NPU performance model can accurately predict the remaining execution time of the task when it runs with other tasks concurrently. The soft preemptive real-time scheduling algorithm can provide approximate preemption capability by dynamically adjusting the NPU computing resources of tasks. Finally, we implement a prototype NPU scheduler of the airborne embedded system for the fixed-wing UAV. The proposed models and algorithms are validated on both the simulated and realistic task sets. Experimental results illustrate that WAMSPRES can achieve low prediction error and high scheduling success rate.

Original languageEnglish
Pages (from-to)1058-1070
Number of pages13
JournalIEEE Transactions on Parallel and Distributed Systems
Volume36
Issue number6
DOIs
StatePublished - 2025

Keywords

  • computing power
  • dynamic-quota
  • Embedded system
  • NPU performance model
  • real-time scheduling
  • soft preemptive scheduling

Fingerprint

Dive into the research topics of 'Workload-Aware Performance Model Based Soft Preemptive Real-Time Scheduling for Neural Processing Units'. Together they form a unique fingerprint.

Cite this