Enhancing Molecular Property Prediction through Task-Oriented Transfer Learning: Integrating Universal Structural Insights and Domain-Specific Knowledge

Yanjing Duan, Xixi Yang, Xiangxiang Zeng, Wenxuan Wang, Youchao Deng, Dongsheng Cao*

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

2 Citations (Scopus)

Abstract

Precisely predicting molecular properties is crucial in drug discovery, but the scarcity of labeled data poses a challenge for applying deep learning methods. While large-scale self-supervised pretraining has proven an effective solution, it often neglects domain-specific knowledge. To tackle this issue, we introduce Task-Oriented Multilevel Learning based on BERT (TOML-BERT), a dual-level pretraining framework that considers both structural patterns and domain knowledge of molecules. TOML-BERT achieved state-of-the-art prediction performance on 10 pharmaceutical datasets. It has the capability to mine contextual information within molecular structures and extract domain knowledge from massive pseudo-labeled data. The dual-level pretraining accomplished significant positive transfer, with its two components making complementary contributions. Interpretive analysis elucidated that the effectiveness of the dual-level pretraining lies in the prior learning of a task-related molecular representation. Overall, TOML-BERT demonstrates the potential of combining multiple pretraining tasks to extract task-oriented knowledge, advancing molecular property prediction in drug discovery.

Original languageEnglish
Pages (from-to)9575-9586
Number of pages12
JournalJournal of Medicinal Chemistry
Volume67
Issue number11
Early online date15 May 2024
DOIs
Publication statusPublished - 13 Jun 2024

Fingerprint

Dive into the research topics of 'Enhancing Molecular Property Prediction through Task-Oriented Transfer Learning: Integrating Universal Structural Insights and Domain-Specific Knowledge'. Together they form a unique fingerprint.

Cite this