TY - GEN
T1 - Identifying Recurrent and Unknown Performance Issues
AU - LIM, Meng Hui
AU - Lou, Jian Guang
AU - Zhang, Hongyu
AU - Fu, Qiang
AU - Teoh, Andrew Beng Jin
AU - Lin, Qingwei
AU - Ding, Rui
AU - Zhang, Dongmei
N1 - Publisher Copyright:
© 2014 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2014/1/1
Y1 - 2014/1/1
N2 - For a large-scale software system, especially an online service system, when a performance issue occurs, it is desirable to check whether this issue has occurred before. If there are past similar issues, a known remedy could be applied. Otherwise, a new troubleshooting process may have to be initiated. The symptom of a performance issue can be characterized by a set of metrics. Due to the sophisticated nature of software systems, manual diagnosis of performance issues based on metric data is typically expensive and laborious. In this paper, we propose a Hidden Markov Random Field (HMRF) based approach to automatic identification of recurrent and unknown performance issues. We formulate the problem of issue identification as a HMRF-based clustering problem. Our approach incorporates the learning of metric discretization thresholds and the optimization of issue clustering. Based on the learned thresholds and cluster centroids, we can achieve accurate identification of recurrent issues and unknown issues. Experimental evaluations on an open benchmark and a large-scale industrial production system show that our approach is effective and outperforms the related state-of-the-art approaches.
AB - For a large-scale software system, especially an online service system, when a performance issue occurs, it is desirable to check whether this issue has occurred before. If there are past similar issues, a known remedy could be applied. Otherwise, a new troubleshooting process may have to be initiated. The symptom of a performance issue can be characterized by a set of metrics. Due to the sophisticated nature of software systems, manual diagnosis of performance issues based on metric data is typically expensive and laborious. In this paper, we propose a Hidden Markov Random Field (HMRF) based approach to automatic identification of recurrent and unknown performance issues. We formulate the problem of issue identification as a HMRF-based clustering problem. Our approach incorporates the learning of metric discretization thresholds and the optimization of issue clustering. Based on the learned thresholds and cluster centroids, we can achieve accurate identification of recurrent issues and unknown issues. Experimental evaluations on an open benchmark and a large-scale industrial production system show that our approach is effective and outperforms the related state-of-the-art approaches.
KW - automated diagnosis
KW - duplication detection
KW - Issue identification
KW - metrics
KW - performance
UR - http://www.scopus.com/inward/record.url?scp=84936948357&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2014.96
DO - 10.1109/ICDM.2014.96
M3 - Conference proceeding
AN - SCOPUS:84936948357
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 320
EP - 329
BT - Proceedings - 14th IEEE International Conference on Data Mining, ICDM 2014
A2 - Kumar, Ravi
A2 - Toivonen, Hannu
A2 - Pei, Jian
A2 - Zhexue Huang, Joshua
A2 - Wu, Xindong
PB - IEEE
T2 - 14th IEEE International Conference on Data Mining, ICDM 2014
Y2 - 14 December 2014 through 17 December 2014
ER -