On gene selection and classification for cancer microarray data using multi-step clustering and sparse representation

Liping Jing*, Kwok Po NG, Tieyong ZENG

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

1 Citation (Scopus)


Microarray data profiles gene expression on a whole genome scale, and provides a good way to study associations between gene expression and occurrence or progression of cancer disease. Many researchers realized that microarray data is useful to predict cancer cases. However, the high dimension of gene expressions, which is significantly larger than the sample size, makes this task very difficult. It is very important to identify the significant genes causing cancer. Many feature selection algorithms have been proposed focusing on improving cancer predictive accuracy at the expense of ignoring the correlations between the features. In this work, a novel framework (named by SGS) is presented for significant genes selection and efficient cancer case classification. The proposed framework first performs a clustering algorithm to find the gene groups where genes in each group have higher correlation coefficient, and then selects (1) the significant (2) genes in each group using the Bayesian Lasso method and important gene groups using the group Lasso method, and finally builds a prediction model based on the shrinkage gene space with efficient classification algorithm (such as support vector machine (SVM), 1NN, and regression). Experimental results on public available microarray data show that the proposed framework often outperforms the existing feature selection and prediction methods such as SAM, information gain (IG), and Lasso-type prediction models.

Original languageEnglish
Pages (from-to)127-148
Number of pages22
JournalAdvances in Adaptive Data Analysis
Issue number1-2
Publication statusPublished - Apr 2011

Scopus Subject Areas

  • Information Systems
  • Computer Science Applications

User-Defined Keywords

  • cancer prediction
  • classification
  • clustering
  • Gene selection
  • Lasso


Dive into the research topics of 'On gene selection and classification for cancer microarray data using multi-step clustering and sparse representation'. Together they form a unique fingerprint.

Cite this