Skip to main navigation Skip to search Skip to main content

Annotation-guided Protein Design with Multi-Level Domain Alignment

  • Chaohao Yuan
  • , Songyou Li
  • , Geyan Ye
  • , Yikun Zhang
  • , Long Kai Huang
  • , Wenbing Huang
  • , Wei Liu
  • , Jianhua Yao
  • , Yu Rong*
  • *Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

1 Citation (Scopus)

Abstract

The core challenge of de novo protein design lies in creating proteins with specific functions or properties, guided by certain conditions. Current models explore to generate protein using structural and evolutionary guidance, which only provide indirect conditions concerning functions and properties. However, textual annotations of proteins, especially the annotations for protein domains, which directly describe the protein's high-level functionalities, properties, and their correlation with target amino acid sequences, remain unexplored in the context of protein design tasks. In this paper, we propose Protein-Annotation Alignment Generation PAAG, a multi-modality protein design framework that integrates the textual annotations extracted from protein database for controllable generation in sequence space. Specifically, within a multi-level alignment module, PAAG can explicitly generate proteins containing specific domains conditioned on the corresponding domain annotations, and can even design novel proteins with flexible combinations of different kinds of annotations. Our experimental results underscore the superiority of the aligned protein representations from PAAG over 7 prediction tasks. Furthermore, PAAG demonstrates a significant increase in generation success rate (24.7% vs 4.7% in zinc finger, and 54.3% vs 22.0% in the immunoglobulin domain) in comparison to the existing model. We anticipate that PAAG will broaden the horizons of protein design by leveraging the knowledge from between textual annotation and proteins.

Original languageEnglish
Title of host publicationKDD 2025 - Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
EditorsYizhou Sun, Flavio Chierichetti, Hady W. Lauw, Claudia Perlich, WeeHyong Tok, Andrew Tomkins
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Pages1855-1866
Number of pages12
Volume1
ISBN (Electronic)9798400712456
DOIs
Publication statusPublished - 20 Jul 2025
Event31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025 - Toronto Convention Centre, Toronto, Canada
Duration: 3 Aug 20257 Aug 2025
https://dl.acm.org/doi/proceedings/10.1145/3690624 (Conference proceeding)
https://kdd2025.kdd.org/ (Conference website)
https://kdd2025.kdd.org/schedule-at-a-glance/ (Conference schedule)

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
ISSN (Print)2154-817X

Conference

Conference31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025
Abbreviated titleKDD 2025
Country/TerritoryCanada
CityToronto
Period3/08/257/08/25
Internet address

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

User-Defined Keywords

  • annotation-guided protein design
  • multi-modality alignment

Fingerprint

Dive into the research topics of 'Annotation-guided Protein Design with Multi-Level Domain Alignment'. Together they form a unique fingerprint.

Cite this