PySmash: Python package and individual executable program for representative substructure generation and application

Zi Yi Yang, Zhi Jiang Yang, Yue Zhao, Ming Zhu Yin, Ai Ping Lu, Xiang Chen, Shao Liu*, Ting Jun Hou*, Dong Sheng Cao*

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

3 Citations (Scopus)


Background: Substructure screening is widely applied to evaluate the molecular potency and ADMET properties of compounds in drug discovery pipelines, and it can also be used to interpret QSAR models for the design of new compounds with desirable physicochemical and biological properties. With the continuous accumulation of more experimental data, data-driven computational systems which can derive representative substructures from large chemical libraries attract more attention. Therefore, the development of an integrated and convenient tool to generate and implement representative substructures is urgently needed.

Results: In this study, PySmash, a user-friendly and powerful tool to generate different types of representative substructures, was developed. The current version of PySmash provides both a Python package and an individual executable program, which achieves ease of operation and pipeline integration. Three types of substructure generation algorithms, including circular, path-based and functional group-based algorithms, are provided. Users can conveniently customize their own requirements for substructure size, accuracy and coverage, statistical significance and parallel computation during execution. Besides, PySmash provides the function for external data screening.

Conclusion: PySmash, a user-friendly and integrated tool for the automatic generation and implementation of representative substructures, is presented. Three screening examples, including toxicophore derivation, privileged motif detection and the integration of substructures with machine learning (ML) models, are provided to illustrate the utility of PySmash in safety profile evaluation, therapeutic activity exploration and molecular optimization, respectively. Its executable program and Python package are available at

Original languageEnglish
Article numberbbab017
Number of pages8
JournalBriefings in Bioinformatics
Issue number5
Publication statusPublished - Sept 2021

Scopus Subject Areas

  • Information Systems
  • Molecular Biology

User-Defined Keywords

  • Python package
  • QSAR
  • software
  • substructure screening


Dive into the research topics of 'PySmash: Python package and individual executable program for representative substructure generation and application'. Together they form a unique fingerprint.

Cite this