跳到主要导航 跳到搜索 跳到主要内容

Machine learning based performance analysis and prediction of jobs on a HPC cluster

  • Zhengxiong Hou
  • , Shuxin Zhao
  • , Chao Yin
  • , Yunlan Wang
  • , Jianhua Gu
  • , Xingshe Zhou

科研成果: 书/报告/会议事项章节会议稿件同行评审

15 引用 (Scopus)

摘要

There are a lot of middle-class or small-class high-performance computing clusters at universities and research institutes, etc. Large volumes of job logs have been accumulated after many years of operation. In this paper, on the basis of accumulated job logs on a high-performance computing cluster, we examine and analyze the job logs. Then, we study machine learning based performance analysis and prediction methods for parallel jobs. Various machine learning methods such as multivariate linear fitting, artificial neural network are used to build performance prediction models. We compare the errors of each model, and select the optimal prediction model for different users. The experimental results show that we can obtain reasonable prediction accuracy using the selected machine learning algorithms.

源语言英语
主期刊名Proceedings - 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019
编辑Hui Tian, Hong Shen, Wee Lum Tan
出版商Institute of Electrical and Electronics Engineers Inc.
247-252
页数6
ISBN(电子版)9781728126166
DOI
出版状态已出版 - 12月 2019
活动20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019 - Gold Coast, 澳大利亚
期限: 5 12月 20197 12月 2019

出版系列

姓名Proceedings - 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019

会议

会议20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019
国家/地区澳大利亚
Gold Coast
时期5/12/197/12/19

指纹

探究 'Machine learning based performance analysis and prediction of jobs on a HPC cluster' 的科研主题。它们共同构成独一无二的指纹。

引用此