TY - JOUR
T1 - ADME properties evaluation in drug discovery
T2 - Prediction of plasma protein binding using NSGA-II combining PLS and consensus modeling
AU - Wang, Ning Ning
AU - Deng, Zhen Ke
AU - Huang, Chen
AU - Dong, Jie
AU - Zhu, Min Feng
AU - Yao, Zhi Jiang
AU - Chen, Alex F.
AU - LYU, Aiping
AU - Mi, Qi
AU - Cao, Dong Sheng
N1 - Funding Information:
This work is financially supported by the National Key Basic Research Program ( 2015CB910700 ), the National Natural Science Foundation of China (Grants No. 81402853 ), the Central South University Innovation Foundation for Postgraduate ( 2016zzts498 ), the Project of Innovation-driven Plan in Central South University , and the Postdoctoral Science Foundation of Central South University , the Chinese Postdoctoral Science Foundation ( 2014T70794 , 2014M562142 ). The studies meet with the approval of the university's review board.
PY - 2017/11/15
Y1 - 2017/11/15
N2 - Plasma protein binding affinity of a drug compound has a strong influence on its pharmacodynamic behavior because it can affect the drug uptake and distribution. In this study, we collected a sizeable dataset consisting of 1830 drug compounds from several accessible sources. A descriptor pool composed of four different types of descriptors (2-D, 3-D, Estate and MACCS) was firstly built and non-dominated sorting genetic algorithm (NSGA-II) combining partial least square (PLS) regression was applied to select important descriptors for model building. Finally, we obtained a consensus model (for five-fold cross-validation: Q2 = 0.750; RMSE = 16.151) based on five different predictive models built using random forest (RF), support vector machine (SVM), Cubist, Gaussian process (GP), and Boosting. Further, a test set and two external validation datasets were applied to validate its robustness and practicality. For the test set, RT2 = 0.787 and RMSET = 14.154; when two external datasets were applied, REx2 = 0.704 and 0.703, RMSEEx = 18.194 and 17.233 respectively. Additionally, according to OECD principles, y-randomization, Williams plot and scaffold analyses were proposed to validate the reliability and practical application domain of our predictive model. Overall, our consensus model shows a good prediction performance and generalization ability in predicting plasma protein binding (PPB). After analyzing those important descriptors selected by NSGA-II and RF, we concluded that the PPB of a drug compound is mainly related to its lipophilicity, aromatic rings, and partial charge properties. In summary, this study developed a robust and practical consensus model for PPB prediction and it could be used to the distribution evaluation and risk assessment in the early stage of drug development.
AB - Plasma protein binding affinity of a drug compound has a strong influence on its pharmacodynamic behavior because it can affect the drug uptake and distribution. In this study, we collected a sizeable dataset consisting of 1830 drug compounds from several accessible sources. A descriptor pool composed of four different types of descriptors (2-D, 3-D, Estate and MACCS) was firstly built and non-dominated sorting genetic algorithm (NSGA-II) combining partial least square (PLS) regression was applied to select important descriptors for model building. Finally, we obtained a consensus model (for five-fold cross-validation: Q2 = 0.750; RMSE = 16.151) based on five different predictive models built using random forest (RF), support vector machine (SVM), Cubist, Gaussian process (GP), and Boosting. Further, a test set and two external validation datasets were applied to validate its robustness and practicality. For the test set, RT2 = 0.787 and RMSET = 14.154; when two external datasets were applied, REx2 = 0.704 and 0.703, RMSEEx = 18.194 and 17.233 respectively. Additionally, according to OECD principles, y-randomization, Williams plot and scaffold analyses were proposed to validate the reliability and practical application domain of our predictive model. Overall, our consensus model shows a good prediction performance and generalization ability in predicting plasma protein binding (PPB). After analyzing those important descriptors selected by NSGA-II and RF, we concluded that the PPB of a drug compound is mainly related to its lipophilicity, aromatic rings, and partial charge properties. In summary, this study developed a robust and practical consensus model for PPB prediction and it could be used to the distribution evaluation and risk assessment in the early stage of drug development.
KW - ADME
KW - Consensus model
KW - NSGA-II
KW - Plasma protein binding
KW - QSAR
UR - http://www.scopus.com/inward/record.url?scp=85029634157&partnerID=8YFLogxK
U2 - 10.1016/j.chemolab.2017.09.005
DO - 10.1016/j.chemolab.2017.09.005
M3 - Journal article
AN - SCOPUS:85029634157
SN - 0169-7439
VL - 170
SP - 84
EP - 95
JO - Chemometrics and Intelligent Laboratory Systems
JF - Chemometrics and Intelligent Laboratory Systems
ER -