TY - JOUR
T1 - Systematic Modeling of log D7.4 Based on Ensemble Machine Learning, Group Contribution, and Matched Molecular Pair Analysis
AU - Fu, Li
AU - Liu, Lu
AU - Yang, Zhi Jiang
AU - Li, Pan
AU - Ding, Jun Jie
AU - Yun, Yong Huan
AU - LYU, Aiping
AU - Hou, Ting Jun
AU - Cao, Dong Sheng
N1 - Funding Information:
This work was financially supported by the National Key Basic Research Program (2015CB910700), the National Science Foundation of China (21575128 and 81773632), and the Zhejiang Provincial Natural Science Foundation of China (LZ19H300001). The studies meet with the approval of the university’s review board.
PY - 2020/1/27
Y1 - 2020/1/27
N2 - Lipophilicity, as evaluated by the n-octanol/buffer solution distribution coefficient at pH = 7.4 (log D7.4), is a major determinant of various absorption, distribution, metabolism, elimination, and toxicology (ADMET) parameters of drug candidates. In this study, we developed several quantitative structure-property relationship (QSPR) models to predict log D7.4 based on a large and structurally diverse data set. Eight popular machine learning algorithms were employed to build the prediction models with 43 molecular descriptors selected by a wrapper feature selection method. The results demonstrated that XGBoost yielded better prediction performance than any other single model (RT2 = 0.906 and RMSET = 0.395). Moreover, the consensus model from the top three models could continue to improve the prediction performance (RT2 = 0.922 and RMSET = 0.359). The robustness, reliability, and generalization ability of the models were strictly evaluated by the Y-randomization test and applicability domain analysis. Moreover, the group contribution model based on 110 atom types and the local models for different ionization states were also established and compared to the global models. The results demonstrated that the descriptor-based consensus model is superior to the group contribution method, and the local models have no advantage over the global models. Finally, matched molecular pair (MMP) analysis and descriptor importance analysis were performed to extract transformation rules and give some explanations related to log D7.4. In conclusion, we believe that the consensus model developed in this study can be used as a reliable and promising tool to evaluate log D7.4 in drug discovery.
AB - Lipophilicity, as evaluated by the n-octanol/buffer solution distribution coefficient at pH = 7.4 (log D7.4), is a major determinant of various absorption, distribution, metabolism, elimination, and toxicology (ADMET) parameters of drug candidates. In this study, we developed several quantitative structure-property relationship (QSPR) models to predict log D7.4 based on a large and structurally diverse data set. Eight popular machine learning algorithms were employed to build the prediction models with 43 molecular descriptors selected by a wrapper feature selection method. The results demonstrated that XGBoost yielded better prediction performance than any other single model (RT2 = 0.906 and RMSET = 0.395). Moreover, the consensus model from the top three models could continue to improve the prediction performance (RT2 = 0.922 and RMSET = 0.359). The robustness, reliability, and generalization ability of the models were strictly evaluated by the Y-randomization test and applicability domain analysis. Moreover, the group contribution model based on 110 atom types and the local models for different ionization states were also established and compared to the global models. The results demonstrated that the descriptor-based consensus model is superior to the group contribution method, and the local models have no advantage over the global models. Finally, matched molecular pair (MMP) analysis and descriptor importance analysis were performed to extract transformation rules and give some explanations related to log D7.4. In conclusion, we believe that the consensus model developed in this study can be used as a reliable and promising tool to evaluate log D7.4 in drug discovery.
UR - http://www.scopus.com/inward/record.url?scp=85078510177&partnerID=8YFLogxK
U2 - 10.1021/acs.jcim.9b00718
DO - 10.1021/acs.jcim.9b00718
M3 - Journal article
C2 - 31869226
AN - SCOPUS:85078510177
SN - 1549-9596
VL - 60
SP - 63
EP - 76
JO - Journal of Chemical Information and Modeling
JF - Journal of Chemical Information and Modeling
IS - 1
ER -