TY - JOUR
T1 - The Statistical Trends of Protein Evolution
T2 - A Lesson from AlphaFold Database
AU - Tang, Qian Yuan
AU - Ren, Weitong
AU - Wang, Jun
AU - Kaneko, Kunihiko
N1 - Funding information:
We gratefully thank Taro Toyoizumi, Xiangze Zeng, Hisao Moriya, Haobo Wang, Haiguang Liu, Zhengqi He, Sida Chen, Weiyi Qiu, Wenfei Li, Zeke Xie, Yingnan Li, Xinhong Liu, Lei-Han Tang, and Xuefei Li for participating in stimulating discussions. This work was supported by Brain/MINDS from Japan Agency for Medical Research and Development (grant number JP21dm0207001), National Natural Science Foundation of China (grant number 11774157), a Grant-in-Aid for Scientific Research on Innovative Areas (grant number 17H06386) from the Ministry of Education, Culture, Sports, Science and Technology of Japan, a Grant-in-Aid for Scientific Research (grant number (A)20H00123) from the Japanese Society for the Promotion of Science, and Novo Nordisk Fonden (to K.K.).
Publisher Copyright:
© The Author(s) 2022.
PY - 2022/10
Y1 - 2022/10
N2 - The recent development of artificial intelligence provides us with new and powerful tools for studying the mysterious relationship between organism evolution and protein evolution. In this work, based on the AlphaFold Protein Structure Database (AlphaFold DB), we perform comparative analyses of the proteins of different organisms. The statistics of AlphaFold-predicted structures show that, for organisms with higher complexity, their constituent proteins will have larger radii of gyration, higher coil fractions, and slower vibrations, statistically. By conducting normal mode analysis and scaling analyses, we demonstrate that higher organismal complexity correlates with lower fractal dimensions in both the structure and dynamics of the constituent proteins, suggesting that higher functional specialization is associated with higher organismal complexity. We also uncover the topology and sequence bases of these correlations. As the organismal complexity increases, the residue contact networks of the constituent proteins will be more assortative, and these proteins will have a higher degree of hydrophilic-hydrophobic segregation in the sequences. Furthermore, by comparing the statistical structural proximity across the proteomes with the phylogenetic tree of homologous proteins, we show that, statistical structural proximity across the proteomes may indirectly reflect the phylogenetic proximity, indicating a statistical trend of protein evolution in parallel with organism evolution. This study provides new insights into how the diversity in the functionality of proteins increases and how the dimensionality of the manifold of protein dynamics reduces during evolution, contributing to the understanding of the origin and evolution of lives.
AB - The recent development of artificial intelligence provides us with new and powerful tools for studying the mysterious relationship between organism evolution and protein evolution. In this work, based on the AlphaFold Protein Structure Database (AlphaFold DB), we perform comparative analyses of the proteins of different organisms. The statistics of AlphaFold-predicted structures show that, for organisms with higher complexity, their constituent proteins will have larger radii of gyration, higher coil fractions, and slower vibrations, statistically. By conducting normal mode analysis and scaling analyses, we demonstrate that higher organismal complexity correlates with lower fractal dimensions in both the structure and dynamics of the constituent proteins, suggesting that higher functional specialization is associated with higher organismal complexity. We also uncover the topology and sequence bases of these correlations. As the organismal complexity increases, the residue contact networks of the constituent proteins will be more assortative, and these proteins will have a higher degree of hydrophilic-hydrophobic segregation in the sequences. Furthermore, by comparing the statistical structural proximity across the proteomes with the phylogenetic tree of homologous proteins, we show that, statistical structural proximity across the proteomes may indirectly reflect the phylogenetic proximity, indicating a statistical trend of protein evolution in parallel with organism evolution. This study provides new insights into how the diversity in the functionality of proteins increases and how the dimensionality of the manifold of protein dynamics reduces during evolution, contributing to the understanding of the origin and evolution of lives.
KW - evolution of plasticity and complexity
KW - normal mode analysis
KW - protein evolution
KW - protein structure and dynamics
KW - scaling analysis
UR - http://www.scopus.com/inward/record.url?scp=85139573314&partnerID=8YFLogxK
U2 - 10.1093/molbev/msac197
DO - 10.1093/molbev/msac197
M3 - Journal article
C2 - 36108094
AN - SCOPUS:85139573314
SN - 0737-4038
VL - 39
JO - Molecular Biology and Evolution
JF - Molecular Biology and Evolution
IS - 10
M1 - msac197
ER -