Structural Analysis and Identification of Colloidal Aggregators in Drug Discovery

Zi Yi Yang, Zhi Jiang Yang, Jie Dong, Liang Liang Wang, Liu Xia Zhang, Jun Jie Ding, Xiao Qin Ding, Aiping LYU, Ting Jun Hou*, Dong Sheng Cao

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

37 Citations (Scopus)


Aggregation has been posing a great challenge in drug discovery. Current computational approaches aiming to filter out aggregated molecules based on their similarity to known aggregators, such as Aggregator Advisor, have low prediction accuracy, and therefore development of reliable in silico models to detect aggregators is highly desirable. In this study, we built a data set consisting of 12 119 aggregators and 24 172 drugs or drug candidates and then developed a group of classification models based on the combination of two ensemble learning approaches and five types of molecular representations. The best model yielded an accuracy of 0.950 and an area under the curve (AUC) value of 0.987 for the training set, and an accuracy of 0.937 and an AUC of 0.976 for the test set. The best model also gave reliable predictions to the external validation set with 5681 aggregators since 80% of molecules were predicted to be aggregators with a prediction probability higher than 0.9. More importantly, we explored the relationship between colloidal aggregation and molecular features, and generalized a set of simple rules to detect aggregators. Molecular features, such as log D, the number of hydroxyl groups, the number of aromatic carbons attached to a hydrogen atom, and the number of sulfur atoms in aromatic heterocycles, would be helpful to distinguish aggregators from nonaggregators. A comparison with numerous existing druglikeness and aggregation filtering rules and models used in virtual screening verified the high reliability of the model and rules proposed in this study. We also used the model to screen several curated chemical databases, and almost 20% of molecules in the evaluated databases were predicted as aggregators, highlighting the potential high risk of aggregation in screening. Finally, we developed an online Web server of ChemAGG (, which offers a freely available tool to detect aggregators.

Original languageEnglish
Pages (from-to)3714-3726
Number of pages13
JournalJournal of Chemical Information and Modeling
Issue number9
Publication statusPublished - 23 Sept 2019

Scopus Subject Areas

  • Chemistry(all)
  • Chemical Engineering(all)
  • Computer Science Applications
  • Library and Information Sciences


Dive into the research topics of 'Structural Analysis and Identification of Colloidal Aggregators in Drug Discovery'. Together they form a unique fingerprint.

Cite this