Discretizing numerical attributes in decision tree for big data analysis

Yiqun Zhang, Yiu Ming Cheung

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

12 Citations (Scopus)

Abstract

The decision tree induction learning is a typical machine learning approach which has been extensively applied for data mining and knowledge discovery. For numerical data and mixed data, discretization is an essential pre-processing step of decision tree learning. However, when coping with big data, most of the existing discretization approaches will not be quite efficient from the practical viewpoint. Accordingly, we propose a new discretization method based on windowing and hierarchical clustering to improve the performance of conventional decision tree for big data analysis. The proposed method not only provides a faster process of discretizing numerical attributes with the competent classification accuracy, but also reduces the size of the decision tree. Experiments show the efficacy of the proposed method on the real data sets.

Original languageEnglish
Title of host publicationProceedings - 14th IEEE International Conference on Data Mining Workshops, ICDMW 2014
EditorsZhi-Hua Zhou, Wei Wang, Ravi Kumar, Hannu Toivonen, Jian Pei, Joshua Zhexue Huang, Xindong Wu
PublisherIEEE Computer Society
Pages1150-1157
Number of pages8
EditionJanuary
ISBN (Electronic)9781479942749
DOIs
Publication statusPublished - 26 Jan 2015
Event14th IEEE International Conference on Data Mining Workshops, ICDMW 2014 - Shenzhen, China
Duration: 14 Dec 2014 → …

Publication series

NameIEEE International Conference on Data Mining Workshops, ICDMW
NumberJanuary
Volume2015-January
ISSN (Print)2375-9232
ISSN (Electronic)2375-9259

Conference

Conference14th IEEE International Conference on Data Mining Workshops, ICDMW 2014
Country/TerritoryChina
CityShenzhen
Period14/12/14 → …

Scopus Subject Areas

  • Computer Science Applications
  • Software

User-Defined Keywords

  • Big Data
  • Discretization
  • Hierarchical Clustering
  • Noise
  • Numerical Attribute
  • Window

Fingerprint

Dive into the research topics of 'Discretizing numerical attributes in decision tree for big data analysis'. Together they form a unique fingerprint.

Cite this