Protein Reverse Engineering Based on Evolutionary Information

Project: Research project

Project Details


Proteins are incredible bio-machineries that play essential roles in almost all kinds of life activities and processes. Under physiological conditions, most proteins with stable folded structures can respond to the perturbations from the milieu and perform intrinsic dynamics around their native-state structures. These intrinsic dynamics encode the functionality of the proteins. Notably, the mutations in the amino acid sequences may lead to variations in the folded structures of the proteins, resulting in changes in dynamics and functions. Such a correspondence between evolution (sequence mutation) and dynamics (PRL, 127: 098103, 2021) is in line with the classical “sequence-structurefunction” paradigm of proteins. However, it is challenging to predict precisely how a mutation will affect the dynamics of a protein and to determine the function-related mutations in protein engineering.

Fortunately, recent years have witnessed the overwhelming success of AlphaFold (Nature, 596: 583-9, 2021) and other AI-based methods in bridging the sequence and structure of proteins. There are AI-predicted protein databases (AlphaFold Database and ESM Metagenomic Atlas) providing high-accuracy structural predictions of hundreds of millions of proteins, allowing extensive statistical analyses for protein evolution and dynamics. Previously, we demonstrated how AlphaFold Database could be applied to uncover general statistical trends in protein evolution (Mol Biol Evol, 39: msac197, 2022). In this research, we will go even further, using AI-predicted structure to learn the evolution of hundreds of protein families. Despite their different sequences, proteins from the same family have similar folded structures and intrinsic dynamics, especially those slow-mode motions corresponding to large-scale conformational changes. Following a reversed logic chain “dynamics-structure-sequence”, we aim to establish a theoretical framework of protein reverse engineering and elucidate how the changes in dynamics result from the mutations in sequences, opening the black boxes of protein functions.

By combining perturbation analysis with statistical analysis of slow modes, we will identify the mutations that can significantly affect protein dynamics. Moreover, statistical physics analyses can help us to uncover the long-range correlations in the proteins (PRL, 118: 088102, 2017), i.e., to identify the key mutations distal from function-related regions and to reveal the epistasis interactions between two mutant sites. The predicted function-related mutations can be further validated by atomistic molecular simulation. We believe that such a framework will provide a promising approach to elucidate the genotype-phenotype mapping of proteins, promote our understanding of proteins’ functional or dysfunctional behaviours, and shed light on the research of drug design and protein engineering.
StatusNot started
Effective start/end date1/01/2431/12/26


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.