TY - JOUR
T1 - 基于强化学习的Lustre文件系统的性能调优
AU - Zhang, Wentao
AU - Wang, Lu
AU - Cheng, Yaodong
N1 - Publisher Copyright:
© 2019, Science Press. All right reserved.
PY - 2019/7/1
Y1 - 2019/7/1
N2 - Computing of high energy physics is a typical data-intensive application. The throughput and response time of distributed storage system are key performance indicators, and they are often the targets of performance optimization. There are a large number of parameters that can be adjusted in a distributed storage system. The setting of these parameters has great influence on the performance of the system. At present, these parameters are either set with static values or automatically tuned by some heuristic rules defined by experienced administrators. Neither of the method is optimistic taking into account the diversity of data access patterns and hardware capabilities, and the difficulty of finding heuristic rules for hundreds of interacted parameters based on human experience. In fact, if the tuning engine is regarded as an agent and the storage system is regarded as the environment, the parameter adjustment problem of the storage system can be treated as a typical sequential decision problem. Therefore, based on data access characteristics of high energy physics calculation, we propose an automated parameter tuning method using the reinforcement learning. Experiments show that in the same test environment, using the default parameters of the Lustre file system as a baseline, this method can increase the throughput by about 30%.
AB - Computing of high energy physics is a typical data-intensive application. The throughput and response time of distributed storage system are key performance indicators, and they are often the targets of performance optimization. There are a large number of parameters that can be adjusted in a distributed storage system. The setting of these parameters has great influence on the performance of the system. At present, these parameters are either set with static values or automatically tuned by some heuristic rules defined by experienced administrators. Neither of the method is optimistic taking into account the diversity of data access patterns and hardware capabilities, and the difficulty of finding heuristic rules for hundreds of interacted parameters based on human experience. In fact, if the tuning engine is regarded as an agent and the storage system is regarded as the environment, the parameter adjustment problem of the storage system can be treated as a typical sequential decision problem. Therefore, based on data access characteristics of high energy physics calculation, we propose an automated parameter tuning method using the reinforcement learning. Experiments show that in the same test environment, using the default parameters of the Lustre file system as a baseline, this method can increase the throughput by about 30%.
KW - Deep learning
KW - Distributed storage
KW - Parameter adjustment
KW - Performance tuning
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85071175259&partnerID=8YFLogxK
U2 - 10.7544/issn1000-1239.2019.20180797
DO - 10.7544/issn1000-1239.2019.20180797
M3 - 文章
AN - SCOPUS:85071175259
SN - 1000-1239
VL - 56
SP - 1578
EP - 1586
JO - Jisuanji Yanjiu yu Fazhan/Computer Research and Development
JF - Jisuanji Yanjiu yu Fazhan/Computer Research and Development
IS - 7
ER -