A semi-supervised regression model for mixed numerical and categorical variables

Michael K. Ng*, Elaine Y. Chan, Meko M.C. So, Wai Ki Ching

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

8 Citations (Scopus)

Abstract

In this paper, we develop a semi-supervised regression algorithm to analyze data sets which contain both categorical and numerical attributes. This algorithm partitions the data sets into several clusters and at the same time fits a multivariate regression model to each cluster. This framework allows one to incorporate both multivariate regression models for numerical variables (supervised learning methods) and k-mode clustering algorithms for categorical variables (unsupervised learning methods). The estimates of regression models and k-mode parameters can be obtained simultaneously by minimizing a function which is the weighted sum of the least-square errors in the multivariate regression models and the dissimilarity measures among the categorical variables. Both synthetic and real data sets are presented to demonstrate the effectiveness of the proposed method.

Original languageEnglish
Pages (from-to)1745-1752
Number of pages8
JournalPattern Recognition
Volume40
Issue number6
DOIs
Publication statusPublished - Jun 2007

Scopus Subject Areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

User-Defined Keywords

  • Categorical variables
  • Clustering
  • Data mining
  • Numerical variables
  • Regression

Fingerprint

Dive into the research topics of 'A semi-supervised regression model for mixed numerical and categorical variables'. Together they form a unique fingerprint.

Cite this