ifDEEPre: large protein language-based deep learning enables interpretable and fast predictions of enzyme commission numbers

Qingxiong Tan, Jin Xiao, Jiayang Chen, Yixuan Wang, Zeliang Zhang, Tiancheng Zhao, Yu Li*

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

Accurate understanding of the biological functions of enzymes is vital for various tasks in both pathologies and industrial biotechnology. However, the existing methods are usually not fast enough and lack explanations on the prediction results, which severely limits their real-world applications. Following our previous work, DEEPre, we propose a new interpretable and fast version (ifDEEPre) by designing novel self-guided attention and incorporating biological knowledge learned via large protein language models to accurately predict the commission numbers of enzymes and confirm their functions. Novel self-guided attention is designed to optimize the unique contributions of representations, automatically detecting key protein motifs to provide meaningful interpretations. Representations learned from raw protein sequences are strictly screened to improve the running speed of the framework, 50 times faster than DEEPre while requiring 12.89 times smaller storage space. Large language modules are incorporated to learn physical properties from hundreds of millions of proteins, extending biological knowledge of the whole network. Extensive experiments indicate that ifDEEPre outperforms all the current methods, achieving more than 14.22% larger F1-score on the NEW dataset. Furthermore, the trained ifDEEPre models accurately capture multi-level protein biological patterns and infer evolutionary trends of enzymes by taking only raw sequences without label information. Meanwhile, ifDEEPre predicts the evolutionary relationships between different yeast sub-species, which are highly consistent with the ground truth. Case studies indicate that ifDEEPre can detect key amino acid motifs, which have important implications for designing novel enzymes. A web server running ifDEEPre is available at https://proj.cse.cuhk.edu.hk/aihlab/ifdeepre/ to provide convenient services to the public. Meanwhile, ifDEEPre is freely available on GitHub at https://github.com/ml4bio/ifDEEPre/.

Original languageEnglish
Article numberbbae225
Number of pages21
JournalBriefings in Bioinformatics
Volume25
Issue number4
DOIs
Publication statusPublished - Jul 2024

User-Defined Keywords

  • enzyme prediction
  • evolutionary inference
  • interpretability analysis
  • large language model
  • protein motif detection
  • self-guided attentive learning

Fingerprint

Dive into the research topics of 'ifDEEPre: large protein language-based deep learning enables interpretable and fast predictions of enzyme commission numbers'. Together they form a unique fingerprint.

Cite this