What is system hang and how to handle it

Yian Zhu, Yue Li, Jingling Xue, Tian Tan, Jialong Shi, Yang Shen, Chunyan Ma

科研成果: 书/报告/会议事项章节会议稿件同行评审

17 引用 (Scopus)

摘要

Almost every computer user has encountered an unresponsive system failure or system hang, which leaves the user no choice but to power off the computer. In this paper, the causes of such failures are analyzed in detail and one empirical hypothesis for detecting system hang is proposed. This hypothesis exploits a small set of system performance metrics provided by the OS itself, thereby avoiding modifying the OS kernel and introducing additional cost (e.g., hardware modules). Under this hypothesis, we propose SHFH, a self-healing framework to handle system hang, which can be deployed on OS dynamically. One unique feature of SHFH is that its "light-heavy" detection strategy is designed to make intelligent tradeoffs between the performance overhead and the false positive rate induced by system hang detection. Another feature is that its diagnosis-based recovery strategy offers a better granularity to recover from system hang. Our experimental results show that SHFH can cover 95.34% of system hang scenarios, with a false positive rate of 0.58% and 0.6% performance overhead, validating the effectiveness of our empirical hypothesis.

源语言英语
主期刊名Proceedings - 2012 IEEE 23rd International Symposium on Software Reliability Engineering, ISSRE 2012
141-150
页数10
DOI
出版状态已出版 - 2012
活动2012 IEEE 23rd International Symposium on Software Reliability Engineering, ISSRE 2012 - Dallas, TX, 美国
期限: 27 11月 201230 11月 2012

出版系列

姓名Proceedings - International Symposium on Software Reliability Engineering, ISSRE
ISSN(印刷版)1071-9458

会议

会议2012 IEEE 23rd International Symposium on Software Reliability Engineering, ISSRE 2012
国家/地区美国
Dallas, TX
时期27/11/1230/11/12

指纹

探究 'What is system hang and how to handle it' 的科研主题。它们共同构成独一无二的指纹。

引用此