摘要
Automated performance tuning is a tricky task for a large scale storage system. Traditional methods highly reply on experience of system administrators and cannot adapt to changes of working load and system configurations. Reinforcement learning is a promising machine learning paradigm which learns an optimized strategy from the trials and errors between agents and environments. Combining with the strong feature learning capability of deep learning, deep reinforcement learning has showed its success in many fields. We implemented a performance parameter tuning engine based on deep reinforcement learning for Lustre file system, a distributed file system widely used in HEP data centres. Three reinforcement learning algorithms: Deep Q-learning, A2C, and PPO are enabled in the tuning engine. Experiments show that, in a small test bed, with IOzone workload, this method can increase the random read throughput by about 30% compared to default settings of Lustre. In the future, it is possible to apply this method to other parameter tuning use cases of data centre operations.
| 源语言 | 英语 |
|---|---|
| 文章编号 | 012090 |
| 期刊 | Journal of Physics: Conference Series |
| 卷 | 1525 |
| 期 | 1 |
| DOI | |
| 出版状态 | 已出版 - 7 7月 2020 |
| 已对外发布 | 是 |
| 活动 | 19th International Workshop on Advanced Computing and Analysis Techniques in Physics Research, ACAT 2019 - Saas-Fee, 瑞士 期限: 11 3月 2019 → 15 3月 2019 |
指纹
探究 'Automated performance tuning of distributed storage system based on deep reinforcement learning' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver