ProtPSP: Determination of Protein Phosphorylation Sites with the Protein Large Language Model

  • Bifeng Guan
  • , Lei Guo*
  • , Thomas Ka-Yam Lam
  • , Zhuang Xiong
  • , Shangyi Luo
  • , Yang Sun
  • , Kaizhi Chen*
  • *Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

1 Citation (Scopus)

Abstract

Accurate identification of phosphorylation sites is fundamental for advancing the study of protein phosphorylation, which plays a critical role in elucidating protein function and facilitating rational drug design. While mass spectrometry-based experimental techniques are considered the effective method for phosphorylation site identification, their widespread application is often hindered by high costs, limited throughput, and the requirement for specialized equipment. Computational prediction methods have become a popular alternative strategy for providing candidate phosphorylation sites by inferring their presence from the entire protein sequence. However, many current methods struggle to extract sufficient contextual features, limiting their predictive reliability in complex biological settings. To address these issues, we present a novel protein large language model (pLLM)-based model, called ProtPSP, for phosphorylation site prediction. By leveraging a pretrained pLLM, ProtPSP effectively captures complex sequence contexts that conventional models trained on limited phosphorylation data may miss. Comprehensive benchmarking on curated data sets demonstrates that ProtPSP achieves reliable phosphorylation site prediction for both serine/threonine and tyrosine sites, outperforming other commonly used methods across multiple evaluation metrics. Ablation studies substantiate the critical contribution of both pLLM-driven features and the fusion model architecture to overall performance improvements. Moreover, case studies demonstrate that ProtPSP consistently identifies all true phosphorylation sites, underscoring its significant potential as a complementary approach to mass spectrometry-based techniques in biomedical research and drug discovery.

Original languageEnglish
Pages (from-to)63525-63535
Number of pages11
JournalACS Omega
Volume10
Issue number51
Early online date16 Dec 2025
DOIs
Publication statusPublished - 30 Dec 2025

Fingerprint

Dive into the research topics of 'ProtPSP: Determination of Protein Phosphorylation Sites with the Protein Large Language Model'. Together they form a unique fingerprint.

Cite this