Dynamic Graph Attention Meets Pretrained Language Models: Adaptive K-Mer Decomposition for LncRNA-Protein Interaction Prediction

  • Zeyuan Zeng
  • , Jingxian Zeng
  • , Defu Li
  • , Qinke Peng*
  • , Haozhou Li
  • , Ruimeng Li
  • , Wentong Sun
  • , Qingbo Zhang
  • , Jinzhi Wang
  • *Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

Protein-RNA complexes, particularly those involving RNA-binding proteins and long non-coding RNAs (lncRNA), are commonly found to influence gene expression and mediate fundamental cellular processes. Despite significant advances in representations for these biological sequences, sequence decomposition based on k-mer generally results in fix-length substrings, failing to detect the information of variable-length biological functional regions. In this paper, we develop a concept of expressiveness for k-mer decompositions as a theoretical underpinning for traversing all k-mer decompositions. Based on this concept, we propose an advanced approach, BERTDGA-LPI, to detect the information of variable-length biological functional regions utilizing dynamic graph attention and to capture the influence of RNA and protein context leveraging pretrained language models. The experimental results demonstrate the outperformance of BERTDGA-LPI over state-of-the-art methods across two homo sapiens datasets, one plant species dataset, and two species-unspecific datasets. Furthermore, BERTDGA-LPI is validated as effective in predicting unknown RNA-protein interactions (RPI) with 100% prediction accuracy in six independent validation sets from different species. This study lays a theoretical underpinning for traversing all k-mer decompositions and innovatively offers a broadly applicable and efficient tool for LPI prediction and RPI prediction based only on sequences.

Original languageEnglish
Pages (from-to)3175-3187
Number of pages13
JournalIEEE Transactions on Computational Biology and Bioinformatics
Volume22
Issue number6
Early online date25 Sept 2025
DOIs
Publication statusPublished - Nov 2025

User-Defined Keywords

  • Graph Neural Network
  • K-Mer
  • LncRNA-Protein Interaction
  • Long Non-Coding RNA
  • Pretrained Language Model

Fingerprint

Dive into the research topics of 'Dynamic Graph Attention Meets Pretrained Language Models: Adaptive K-Mer Decomposition for LncRNA-Protein Interaction Prediction'. Together they form a unique fingerprint.

Cite this