Abstract
Determining the biological functions of proteins is one of the key challenges in the post-genomic era. The rapidly accumulated large volumes of proteomic and genomic data drives to develop computational models for automatically predicting protein function in large scale. Recent approaches focus on integrating multiple heterogeneous data sources and they often get better results than methods that use single data source alone. In this paper, we investigate how to integrate multiple biological data sources with the biological knowledge, i.e., Gene Ontology (GO), for protein function prediction. We propose a method, called SimNet, to Semantically i ntegrate multiple functional association Networks derived from heterogenous data sources. SimNet firstly utilizes GO annotations of proteins to capture the semantic similarity between proteins and introduces a semantic kernel based on the similarity. Next, SimNet constructs a composite network, obtained as a weighted summation of individual networks, and aligns the network with the kernel to get the weights assigned to individual networks. Then, it applies a network-based classifier on the composite network to predict protein function. Experiment results on heterogenous proteomic data sources of Yeast, Human, Mouse, and Fly show that, SimNet not only achieves better (or comparable) results than other related competitive approaches, but also takes much less time. The Matlab codes of SimNet are available at https://sites.google.com/site/guoxian85/simnet.
Original language | English |
---|---|
Article number | 7164278 |
Pages (from-to) | 220-232 |
Number of pages | 13 |
Journal | IEEE/ACM Transactions on Computational Biology and Bioinformatics |
Volume | 13 |
Issue number | 2 |
DOIs | |
Publication status | Published - 1 Mar 2016 |
Scopus Subject Areas
- Biotechnology
- Genetics
- Applied Mathematics
User-Defined Keywords
- Function prediction
- multiple networks
- network-based classifier
- semantic similarity