基于强化学习的Lustre文件系统的性能调优

Wentao Zhang; Lu Wang; Yaodong Cheng

doi:10.7544/issn1000-1239.2019.20180797

基于强化学习的Lustre文件系统的性能调优

Wentao Zhang, Lu Wang, Yaodong Cheng

科研成果: 期刊稿件 › 文章 › 同行评审

5 引用（Scopus）

摘要

Computing of high energy physics is a typical data-intensive application. The throughput and response time of distributed storage system are key performance indicators, and they are often the targets of performance optimization. There are a large number of parameters that can be adjusted in a distributed storage system. The setting of these parameters has great influence on the performance of the system. At present, these parameters are either set with static values or automatically tuned by some heuristic rules defined by experienced administrators. Neither of the method is optimistic taking into account the diversity of data access patterns and hardware capabilities, and the difficulty of finding heuristic rules for hundreds of interacted parameters based on human experience. In fact, if the tuning engine is regarded as an agent and the storage system is regarded as the environment, the parameter adjustment problem of the storage system can be treated as a typical sequential decision problem. Therefore, based on data access characteristics of high energy physics calculation, we propose an automated parameter tuning method using the reinforcement learning. Experiments show that in the same test environment, using the default parameters of the Lustre file system as a baseline, this method can increase the throughput by about 30%.

投稿的翻译标题	Performance Optimization of Lustre File System Based on Reinforcement Learning
源语言	繁体中文
页（从-至）	1578-1586
页数	9
期刊	Jisuanji Yanjiu yu Fazhan/Computer Research and Development
卷	56
期	7
DOI	https://doi.org/10.7544/issn1000-1239.2019.20180797
出版状态	已出版 - 1 7月 2019
已对外发布	是

关键词

Deep learning
Distributed storage
Parameter adjustment
Performance tuning
Reinforcement learning

访问文件

10.7544/issn1000-1239.2019.20180797

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{490b9c138c47428590ea06d3f40859b3,

title = "基于强化学习的Lustre文件系统的性能调优",

abstract = "Computing of high energy physics is a typical data-intensive application. The throughput and response time of distributed storage system are key performance indicators, and they are often the targets of performance optimization. There are a large number of parameters that can be adjusted in a distributed storage system. The setting of these parameters has great influence on the performance of the system. At present, these parameters are either set with static values or automatically tuned by some heuristic rules defined by experienced administrators. Neither of the method is optimistic taking into account the diversity of data access patterns and hardware capabilities, and the difficulty of finding heuristic rules for hundreds of interacted parameters based on human experience. In fact, if the tuning engine is regarded as an agent and the storage system is regarded as the environment, the parameter adjustment problem of the storage system can be treated as a typical sequential decision problem. Therefore, based on data access characteristics of high energy physics calculation, we propose an automated parameter tuning method using the reinforcement learning. Experiments show that in the same test environment, using the default parameters of the Lustre file system as a baseline, this method can increase the throughput by about 30%.",

keywords = "Deep learning, Distributed storage, Parameter adjustment, Performance tuning, Reinforcement learning",

author = "Wentao Zhang and Lu Wang and Yaodong Cheng",

year = "2019",

month = jul,

day = "1",

doi = "10.7544/issn1000-1239.2019.20180797",

language = "繁体中文",

volume = "56",

pages = "1578--1586",

journal = "Jisuanji Yanjiu yu Fazhan/Computer Research and Development",

issn = "1000-1239",

publisher = "Science Press ",

number = "7",

}

TY - JOUR

T1 - 基于强化学习的Lustre文件系统的性能调优

AU - Zhang, Wentao

AU - Wang, Lu

AU - Cheng, Yaodong

PY - 2019/7/1

Y1 - 2019/7/1

N2 - Computing of high energy physics is a typical data-intensive application. The throughput and response time of distributed storage system are key performance indicators, and they are often the targets of performance optimization. There are a large number of parameters that can be adjusted in a distributed storage system. The setting of these parameters has great influence on the performance of the system. At present, these parameters are either set with static values or automatically tuned by some heuristic rules defined by experienced administrators. Neither of the method is optimistic taking into account the diversity of data access patterns and hardware capabilities, and the difficulty of finding heuristic rules for hundreds of interacted parameters based on human experience. In fact, if the tuning engine is regarded as an agent and the storage system is regarded as the environment, the parameter adjustment problem of the storage system can be treated as a typical sequential decision problem. Therefore, based on data access characteristics of high energy physics calculation, we propose an automated parameter tuning method using the reinforcement learning. Experiments show that in the same test environment, using the default parameters of the Lustre file system as a baseline, this method can increase the throughput by about 30%.

AB - Computing of high energy physics is a typical data-intensive application. The throughput and response time of distributed storage system are key performance indicators, and they are often the targets of performance optimization. There are a large number of parameters that can be adjusted in a distributed storage system. The setting of these parameters has great influence on the performance of the system. At present, these parameters are either set with static values or automatically tuned by some heuristic rules defined by experienced administrators. Neither of the method is optimistic taking into account the diversity of data access patterns and hardware capabilities, and the difficulty of finding heuristic rules for hundreds of interacted parameters based on human experience. In fact, if the tuning engine is regarded as an agent and the storage system is regarded as the environment, the parameter adjustment problem of the storage system can be treated as a typical sequential decision problem. Therefore, based on data access characteristics of high energy physics calculation, we propose an automated parameter tuning method using the reinforcement learning. Experiments show that in the same test environment, using the default parameters of the Lustre file system as a baseline, this method can increase the throughput by about 30%.

KW - Deep learning

KW - Distributed storage

KW - Parameter adjustment

KW - Performance tuning

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85071175259&partnerID=8YFLogxK

U2 - 10.7544/issn1000-1239.2019.20180797

DO - 10.7544/issn1000-1239.2019.20180797

M3 - 文章

AN - SCOPUS:85071175259

SN - 1000-1239

VL - 56

SP - 1578

EP - 1586

JO - Jisuanji Yanjiu yu Fazhan/Computer Research and Development

JF - Jisuanji Yanjiu yu Fazhan/Computer Research and Development

IS - 7

ER -

基于强化学习的Lustre文件系统的性能调优

摘要

关键词

访问文件

其它文件与链接

指纹

引用此