A novel approach to extracting features from motif content and protein composition for protein sequence classification

Xing Ming Zhao, Yiu Ming CHEUNG, De Shuang Huang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

40 Citations (Scopus)

Abstract

This paper presents a novel approach to extracting features from motif content and protein composition for protein sequence classification. First, we formulate a protein sequence as a fixed-dimensional vector using the motif content and protein composition. Then, we further project the vectors into a low-dimensional space by the Principal Component Analysis (PCA) so that they can be represented by a combination of the eigenvectors of the covariance matrix of these vectors. Subsequently, the Genetic Algorithm (GA) is used to extract a subset of biological and functional sequence features from the eigen-space and to optimize the regularization parameter of the Support Vector Machine (SVM) simultaneously. Finally, we utilize the SVM classifiers to classify protein sequences into corresponding families based on the selected feature subsets. In comparison with the existing PSI-BLAST and SVM-pairwise methods, the experiments show the promising results of our approach.

Original languageEnglish
Pages (from-to)1019-1028
Number of pages10
JournalNeural Networks
Volume18
Issue number8
DOIs
Publication statusPublished - Oct 2005

Scopus Subject Areas

  • Cognitive Neuroscience
  • Artificial Intelligence

User-Defined Keywords

  • Genetic algorithm
  • Motif content
  • Protein composition
  • Protein sequence classification
  • Support vector machine

Fingerprint

Dive into the research topics of 'A novel approach to extracting features from motif content and protein composition for protein sequence classification'. Together they form a unique fingerprint.

Cite this