Abstract
Accurate identification of phosphorylation sites is fundamental for advancing the study of protein phosphorylation, which plays a critical role in elucidating protein function and facilitating rational drug design. While mass spectrometry-based experimental techniques are considered the effective method for phosphorylation site identification, their widespread application is often hindered by high costs, limited throughput, and the requirement for specialized equipment. Computational prediction methods have become a popular alternative strategy for providing candidate phosphorylation sites by inferring their presence from the entire protein sequence. However, many current methods struggle to extract sufficient contextual features, limiting their predictive reliability in complex biological settings. To address these issues, we present a novel protein large language model (pLLM)-based model, called ProtPSP, for phosphorylation site prediction. By leveraging a pretrained pLLM, ProtPSP effectively captures complex sequence contexts that conventional models trained on limited phosphorylation data may miss. Comprehensive benchmarking on curated data sets demonstrates that ProtPSP achieves reliable phosphorylation site prediction for both serine/threonine and tyrosine sites, outperforming other commonly used methods across multiple evaluation metrics. Ablation studies substantiate the critical contribution of both pLLM-driven features and the fusion model architecture to overall performance improvements. Moreover, case studies demonstrate that ProtPSP consistently identifies all true phosphorylation sites, underscoring its significant potential as a complementary approach to mass spectrometry-based techniques in biomedical research and drug discovery.
| Original language | English |
|---|---|
| Pages (from-to) | 63525-63535 |
| Number of pages | 11 |
| Journal | ACS Omega |
| Volume | 10 |
| Issue number | 51 |
| Early online date | 16 Dec 2025 |
| DOIs | |
| Publication status | Published - 30 Dec 2025 |
Fingerprint
Dive into the research topics of 'ProtPSP: Determination of Protein Phosphorylation Sites with the Protein Large Language Model'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver