TY - JOUR
T1 - MOST
T2 - Most-similar ligand based approach to target prediction
AU - Huang, Tao
AU - Mi, Hong
AU - Lin, Cheng Yuan
AU - Zhao, Ling
AU - ZHONG, Lidan
AU - Liu, Feng bin
AU - ZHANG, Ge
AU - LYU, Aiping
AU - BIAN, Zhaoxiang
AU - LIN, Shuhai
AU - Zhang, Man
AU - Li, Yan hong
AU - Hu, Dong dong
AU - Cheng, Chung Wah
AU - MZRW Group
N1 - Funding Information:
The authors thank other members of MZRW Group, including Shu-hai Lin, Man Zhang, Yan-hong Li, Dong-dong Hu, and Chung-Wah Cheng, for their great help and support.This study was supported by Hong Kong Baptist University [grant number RC-IRMC/1213/01A]. The funding body did not play any role in the design or conclusion of this study.
PY - 2017/3/11
Y1 - 2017/3/11
N2 - Background: Many computational approaches have been used for target prediction, including machine learning, reverse docking, bioactivity spectra analysis, and chemical similarity searching. Recent studies have suggested that chemical similarity searching may be driven by the most-similar ligand. However, the extent of bioactivity of most-similar ligands has been oversimplified or even neglected in these studies, and this has impaired the prediction power. Results: Here we propose the MOst-Similar ligand-based Target inference approach, namely MOST, which uses fingerprint similarity and explicit bioactivity of the most-similar ligands to predict targets of the query compound. Performance of MOST was evaluated by using combinations of different fingerprint schemes, machine learning methods, and bioactivity representations. In sevenfold cross-validation with a benchmark Ki dataset from CHEMBL release 19 containing 61,937 bioactivity data of 173 human targets, MOST achieved high average prediction accuracy (0.95 for pKi ≥ 5, and 0.87 for pKi ≥ 6). Morgan fingerprint was shown to be slightly better than FP2. Logistic Regression and Random Forest methods performed better than Naïve Bayes. In a temporal validation, the Ki dataset from CHEMBL19 were used to train models and predict the bioactivity of newly deposited ligands in CHEMBL20. MOST also performed well with high accuracy (0.90 for pKi ≥ 5, and 0.76 for pKi ≥ 6), when Logistic Regression and Morgan fingerprint were employed. Furthermore, the p values associated with explicit bioactivity were found be a robust index for removing false positive predictions. Implicit bioactivity did not offer this capability. Finally, p values generated with Logistic Regression, Morgan fingerprint and explicit activity were integrated with a false discovery rate (FDR) control procedure to reduce false positives in multiple-target prediction scenario, and the success of this strategy it was demonstrated with a case of fluanisone. In the case of aloe-emodin's laxative effect, MOST predicted that acetylcholinesterase was the mechanism-of-action target; in vivo studies validated this prediction. Conclusions: Using the MOST approach can result in highly accurate and robust target prediction. Integrated with a FDR control procedure, MOST provides a reliable framework for multiple-target inference. It has prospective applications in drug repurposing and mechanism-of-action target prediction.
AB - Background: Many computational approaches have been used for target prediction, including machine learning, reverse docking, bioactivity spectra analysis, and chemical similarity searching. Recent studies have suggested that chemical similarity searching may be driven by the most-similar ligand. However, the extent of bioactivity of most-similar ligands has been oversimplified or even neglected in these studies, and this has impaired the prediction power. Results: Here we propose the MOst-Similar ligand-based Target inference approach, namely MOST, which uses fingerprint similarity and explicit bioactivity of the most-similar ligands to predict targets of the query compound. Performance of MOST was evaluated by using combinations of different fingerprint schemes, machine learning methods, and bioactivity representations. In sevenfold cross-validation with a benchmark Ki dataset from CHEMBL release 19 containing 61,937 bioactivity data of 173 human targets, MOST achieved high average prediction accuracy (0.95 for pKi ≥ 5, and 0.87 for pKi ≥ 6). Morgan fingerprint was shown to be slightly better than FP2. Logistic Regression and Random Forest methods performed better than Naïve Bayes. In a temporal validation, the Ki dataset from CHEMBL19 were used to train models and predict the bioactivity of newly deposited ligands in CHEMBL20. MOST also performed well with high accuracy (0.90 for pKi ≥ 5, and 0.76 for pKi ≥ 6), when Logistic Regression and Morgan fingerprint were employed. Furthermore, the p values associated with explicit bioactivity were found be a robust index for removing false positive predictions. Implicit bioactivity did not offer this capability. Finally, p values generated with Logistic Regression, Morgan fingerprint and explicit activity were integrated with a false discovery rate (FDR) control procedure to reduce false positives in multiple-target prediction scenario, and the success of this strategy it was demonstrated with a case of fluanisone. In the case of aloe-emodin's laxative effect, MOST predicted that acetylcholinesterase was the mechanism-of-action target; in vivo studies validated this prediction. Conclusions: Using the MOST approach can result in highly accurate and robust target prediction. Integrated with a FDR control procedure, MOST provides a reliable framework for multiple-target inference. It has prospective applications in drug repurposing and mechanism-of-action target prediction.
KW - Explicit bioactivity
KW - False discovery rate
KW - Logistic regression
KW - Mechanism-of-action target
KW - Most-similar ligand
KW - Target prediction
UR - http://www.scopus.com/inward/record.url?scp=85014835788&partnerID=8YFLogxK
U2 - 10.1186/s12859-017-1586-z
DO - 10.1186/s12859-017-1586-z
M3 - Journal article
C2 - 28284192
AN - SCOPUS:85014835788
SN - 1471-2105
VL - 18
JO - BMC Bioinformatics
JF - BMC Bioinformatics
IS - 1
M1 - 165
ER -