Identifying Recurrent and Unknown Performance Issues

Meng Hui LIM, Jian Guang Lou, Hongyu Zhang, Qiang Fu, Andrew Beng Jin Teoh, Qingwei Lin, Rui Ding, Dongmei Zhang

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

23 Citations (Scopus)

Abstract

For a large-scale software system, especially an online service system, when a performance issue occurs, it is desirable to check whether this issue has occurred before. If there are past similar issues, a known remedy could be applied. Otherwise, a new troubleshooting process may have to be initiated. The symptom of a performance issue can be characterized by a set of metrics. Due to the sophisticated nature of software systems, manual diagnosis of performance issues based on metric data is typically expensive and laborious. In this paper, we propose a Hidden Markov Random Field (HMRF) based approach to automatic identification of recurrent and unknown performance issues. We formulate the problem of issue identification as a HMRF-based clustering problem. Our approach incorporates the learning of metric discretization thresholds and the optimization of issue clustering. Based on the learned thresholds and cluster centroids, we can achieve accurate identification of recurrent issues and unknown issues. Experimental evaluations on an open benchmark and a large-scale industrial production system show that our approach is effective and outperforms the related state-of-the-art approaches.

Original languageEnglish
Title of host publicationProceedings - 14th IEEE International Conference on Data Mining, ICDM 2014
EditorsRavi Kumar, Hannu Toivonen, Jian Pei, Joshua Zhexue Huang, Xindong Wu
PublisherIEEE
Pages320-329
Number of pages10
EditionJanuary
ISBN (Electronic)9781479943029
DOIs
Publication statusPublished - 1 Jan 2014
Event14th IEEE International Conference on Data Mining, ICDM 2014 - Shenzhen, China
Duration: 14 Dec 201417 Dec 2014

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
NumberJanuary
Volume2015-January
ISSN (Print)1550-4786

Conference

Conference14th IEEE International Conference on Data Mining, ICDM 2014
Country/TerritoryChina
CityShenzhen
Period14/12/1417/12/14

Scopus Subject Areas

  • Engineering(all)

User-Defined Keywords

  • automated diagnosis
  • duplication detection
  • Issue identification
  • metrics
  • performance

Fingerprint

Dive into the research topics of 'Identifying Recurrent and Unknown Performance Issues'. Together they form a unique fingerprint.

Cite this