TY - JOUR
T1 - Structural Analysis and Identification of Colloidal Aggregators in Drug Discovery
AU - Yang, Zi Yi
AU - Yang, Zhi Jiang
AU - Dong, Jie
AU - Wang, Liang Liang
AU - Zhang, Liu Xia
AU - Ding, Jun Jie
AU - Ding, Xiao Qin
AU - LYU, Aiping
AU - Hou, Ting Jun
AU - Cao, Dong Sheng
N1 - Funding Information:
This work was financially supported by the National Key Basic Research Program (2015CB910700), National Science & Technology Major Project of China “Key New Drug Creation and Manufacturing Program” (2018ZX09711002-007), Zhejiang Provincial Natural Science Foundation of China (LZ19H300001), and Hunan Provincial Natural Science Foundation of China (2019JJ51003). The studies meet with the approval of the university’s review board.
PY - 2019/9/23
Y1 - 2019/9/23
N2 - Aggregation has been posing a great challenge in drug discovery. Current computational approaches aiming to filter out aggregated molecules based on their similarity to known aggregators, such as Aggregator Advisor, have low prediction accuracy, and therefore development of reliable in silico models to detect aggregators is highly desirable. In this study, we built a data set consisting of 12 119 aggregators and 24 172 drugs or drug candidates and then developed a group of classification models based on the combination of two ensemble learning approaches and five types of molecular representations. The best model yielded an accuracy of 0.950 and an area under the curve (AUC) value of 0.987 for the training set, and an accuracy of 0.937 and an AUC of 0.976 for the test set. The best model also gave reliable predictions to the external validation set with 5681 aggregators since 80% of molecules were predicted to be aggregators with a prediction probability higher than 0.9. More importantly, we explored the relationship between colloidal aggregation and molecular features, and generalized a set of simple rules to detect aggregators. Molecular features, such as log D, the number of hydroxyl groups, the number of aromatic carbons attached to a hydrogen atom, and the number of sulfur atoms in aromatic heterocycles, would be helpful to distinguish aggregators from nonaggregators. A comparison with numerous existing druglikeness and aggregation filtering rules and models used in virtual screening verified the high reliability of the model and rules proposed in this study. We also used the model to screen several curated chemical databases, and almost 20% of molecules in the evaluated databases were predicted as aggregators, highlighting the potential high risk of aggregation in screening. Finally, we developed an online Web server of ChemAGG (http://admet.scbdd.com/ChemAGG/index), which offers a freely available tool to detect aggregators.
AB - Aggregation has been posing a great challenge in drug discovery. Current computational approaches aiming to filter out aggregated molecules based on their similarity to known aggregators, such as Aggregator Advisor, have low prediction accuracy, and therefore development of reliable in silico models to detect aggregators is highly desirable. In this study, we built a data set consisting of 12 119 aggregators and 24 172 drugs or drug candidates and then developed a group of classification models based on the combination of two ensemble learning approaches and five types of molecular representations. The best model yielded an accuracy of 0.950 and an area under the curve (AUC) value of 0.987 for the training set, and an accuracy of 0.937 and an AUC of 0.976 for the test set. The best model also gave reliable predictions to the external validation set with 5681 aggregators since 80% of molecules were predicted to be aggregators with a prediction probability higher than 0.9. More importantly, we explored the relationship between colloidal aggregation and molecular features, and generalized a set of simple rules to detect aggregators. Molecular features, such as log D, the number of hydroxyl groups, the number of aromatic carbons attached to a hydrogen atom, and the number of sulfur atoms in aromatic heterocycles, would be helpful to distinguish aggregators from nonaggregators. A comparison with numerous existing druglikeness and aggregation filtering rules and models used in virtual screening verified the high reliability of the model and rules proposed in this study. We also used the model to screen several curated chemical databases, and almost 20% of molecules in the evaluated databases were predicted as aggregators, highlighting the potential high risk of aggregation in screening. Finally, we developed an online Web server of ChemAGG (http://admet.scbdd.com/ChemAGG/index), which offers a freely available tool to detect aggregators.
UR - http://www.scopus.com/inward/record.url?scp=85072541393&partnerID=8YFLogxK
U2 - 10.1021/acs.jcim.9b00541
DO - 10.1021/acs.jcim.9b00541
M3 - Journal article
C2 - 31430151
AN - SCOPUS:85072541393
SN - 1549-9596
VL - 59
SP - 3714
EP - 3726
JO - Journal of Chemical Information and Modeling
JF - Journal of Chemical Information and Modeling
IS - 9
ER -