ADME properties evaluation in drug discovery: Prediction of plasma protein binding using NSGA-II combining PLS and consensus modeling

Ning Ning Wang, Zhen Ke Deng, Chen Huang, Jie Dong, Min Feng Zhu, Zhi Jiang Yao, Alex F. Chen, Aiping LYU, Qi Mi, Dong Sheng Cao*

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

22 Citations (Scopus)


Plasma protein binding affinity of a drug compound has a strong influence on its pharmacodynamic behavior because it can affect the drug uptake and distribution. In this study, we collected a sizeable dataset consisting of 1830 drug compounds from several accessible sources. A descriptor pool composed of four different types of descriptors (2-D, 3-D, Estate and MACCS) was firstly built and non-dominated sorting genetic algorithm (NSGA-II) combining partial least square (PLS) regression was applied to select important descriptors for model building. Finally, we obtained a consensus model (for five-fold cross-validation: Q2 = 0.750; RMSE = 16.151) based on five different predictive models built using random forest (RF), support vector machine (SVM), Cubist, Gaussian process (GP), and Boosting. Further, a test set and two external validation datasets were applied to validate its robustness and practicality. For the test set, RT2 = 0.787 and RMSET = 14.154; when two external datasets were applied, REx2 = 0.704 and 0.703, RMSEEx = 18.194 and 17.233 respectively. Additionally, according to OECD principles, y-randomization, Williams plot and scaffold analyses were proposed to validate the reliability and practical application domain of our predictive model. Overall, our consensus model shows a good prediction performance and generalization ability in predicting plasma protein binding (PPB). After analyzing those important descriptors selected by NSGA-II and RF, we concluded that the PPB of a drug compound is mainly related to its lipophilicity, aromatic rings, and partial charge properties. In summary, this study developed a robust and practical consensus model for PPB prediction and it could be used to the distribution evaluation and risk assessment in the early stage of drug development.

Original languageEnglish
Pages (from-to)84-95
Number of pages12
JournalChemometrics and Intelligent Laboratory Systems
Publication statusPublished - 15 Nov 2017

Scopus Subject Areas

  • Analytical Chemistry
  • Software
  • Process Chemistry and Technology
  • Spectroscopy
  • Computer Science Applications

User-Defined Keywords

  • ADME
  • Consensus model
  • Plasma protein binding
  • QSAR


Dive into the research topics of 'ADME properties evaluation in drug discovery: Prediction of plasma protein binding using NSGA-II combining PLS and consensus modeling'. Together they form a unique fingerprint.

Cite this