跳到主要导航 跳到搜索 跳到主要内容

Automated performance tuning of distributed storage system based on deep reinforcement learning

  • Lu Wang
  • , Wentao Zhang
  • , Yaodong Cheng

科研成果: 期刊稿件会议文章同行评审

摘要

Automated performance tuning is a tricky task for a large scale storage system. Traditional methods highly reply on experience of system administrators and cannot adapt to changes of working load and system configurations. Reinforcement learning is a promising machine learning paradigm which learns an optimized strategy from the trials and errors between agents and environments. Combining with the strong feature learning capability of deep learning, deep reinforcement learning has showed its success in many fields. We implemented a performance parameter tuning engine based on deep reinforcement learning for Lustre file system, a distributed file system widely used in HEP data centres. Three reinforcement learning algorithms: Deep Q-learning, A2C, and PPO are enabled in the tuning engine. Experiments show that, in a small test bed, with IOzone workload, this method can increase the random read throughput by about 30% compared to default settings of Lustre. In the future, it is possible to apply this method to other parameter tuning use cases of data centre operations.

源语言英语
文章编号012090
期刊Journal of Physics: Conference Series
1525
1
DOI
出版状态已出版 - 7 7月 2020
已对外发布
活动19th International Workshop on Advanced Computing and Analysis Techniques in Physics Research, ACAT 2019 - Saas-Fee, 瑞士
期限: 11 3月 201915 3月 2019

指纹

探究 'Automated performance tuning of distributed storage system based on deep reinforcement learning' 的科研主题。它们共同构成独一无二的指纹。

引用此