What is system hang and how to handle it

Yian Zhu, Yue Li, Jingling Xue, Tian Tan, Jialong Shi, Yang Shen, Chunyan Ma

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

17 Scopus citations

Abstract

Almost every computer user has encountered an unresponsive system failure or system hang, which leaves the user no choice but to power off the computer. In this paper, the causes of such failures are analyzed in detail and one empirical hypothesis for detecting system hang is proposed. This hypothesis exploits a small set of system performance metrics provided by the OS itself, thereby avoiding modifying the OS kernel and introducing additional cost (e.g., hardware modules). Under this hypothesis, we propose SHFH, a self-healing framework to handle system hang, which can be deployed on OS dynamically. One unique feature of SHFH is that its "light-heavy" detection strategy is designed to make intelligent tradeoffs between the performance overhead and the false positive rate induced by system hang detection. Another feature is that its diagnosis-based recovery strategy offers a better granularity to recover from system hang. Our experimental results show that SHFH can cover 95.34% of system hang scenarios, with a false positive rate of 0.58% and 0.6% performance overhead, validating the effectiveness of our empirical hypothesis.

Original languageEnglish
Title of host publicationProceedings - 2012 IEEE 23rd International Symposium on Software Reliability Engineering, ISSRE 2012
Pages141-150
Number of pages10
DOIs
StatePublished - 2012
Event2012 IEEE 23rd International Symposium on Software Reliability Engineering, ISSRE 2012 - Dallas, TX, United States
Duration: 27 Nov 201230 Nov 2012

Publication series

NameProceedings - International Symposium on Software Reliability Engineering, ISSRE
ISSN (Print)1071-9458

Conference

Conference2012 IEEE 23rd International Symposium on Software Reliability Engineering, ISSRE 2012
Country/TerritoryUnited States
CityDallas, TX
Period27/11/1230/11/12

Fingerprint

Dive into the research topics of 'What is system hang and how to handle it'. Together they form a unique fingerprint.

Cite this