TY - JOUR
T1 - Evolutionary tinkering enriches the hierarchical and nested structures in amino acid sequences
AU - Zhang, Zecheng
AU - Liu, Chunxiuzi
AU - Zhu, Yingjun
AU - Peng, Lu
AU - Qiu, Weiyi
AU - Tang, Qianyuan
AU - Liu, He
AU - Zhang, Ke
AU - Di, Zengru
AU - Liu, Yu
N1 - This work is supported by the National Natural Science Foundation of China (Grants No. 12205012, No. 12305052, and No. 32000720), the Early Career Scheme (ECS) from Research Grants Council (RGC) of Hong Kong (Grant No. 22302723), Beijing Normal University via the Youth Talent Strategic Program (Grant No. 28705-310432106), and Guangdong Scientific Research Projects for the Higher-Educational Institution (Grant No. 2021ZDZX1011). We thank the Self-Generation Reading Group and the Life-Complexity Reading Group for their support, both organized by the Swarma Club. We also thank Philip Gerlee, Fan Jin, Yoshitsugu Oono, and Da Zhou for their fruitful discussions and insightful comments.
Publisher Copyright:
© 2024 authors. Published by the American Physical Society. Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI.
PY - 2024/5/28
Y1 - 2024/5/28
N2 - Genetic information often exhibits hierarchical and nested relationships, achieved through the reuse of repetitive subsequences such as duplicons and transposable elements, a concept termed "evolutionary tinkering"by François Jacob. Current bioinformatics tools often struggle to capture these, particularly the nested, relationships. To address this, we utilized ladderpath, an approach within the broader category of algorithmic information theory, introducing two key measures: order rate η for characterizing sequence pattern repetitions and regularities, and ladderpath-complexity κ for assessing hierarchical and nested richness. Our analysis of amino acid sequences revealed that humans have more sequences with higher κ values, and proteins with many intrinsically disordered regions exhibit increased η values. Additionally, it was found that extremely long sequences with low η are rare. We hypothesize that this arises from varied duplication and mutation frequencies across different evolutionary stages, which in turn suggests a zigzag pattern for the evolution of protein complexity. This is supported by simulations and studies of protein families such as ubiquitin and NBPF, implying species-specific or environment-influenced protein elongation strategies. The ladderpath approach offers a quantitative lens to understand evolutionary tinkering and reuse, shedding light on the generative aspects of biological structures.
AB - Genetic information often exhibits hierarchical and nested relationships, achieved through the reuse of repetitive subsequences such as duplicons and transposable elements, a concept termed "evolutionary tinkering"by François Jacob. Current bioinformatics tools often struggle to capture these, particularly the nested, relationships. To address this, we utilized ladderpath, an approach within the broader category of algorithmic information theory, introducing two key measures: order rate η for characterizing sequence pattern repetitions and regularities, and ladderpath-complexity κ for assessing hierarchical and nested richness. Our analysis of amino acid sequences revealed that humans have more sequences with higher κ values, and proteins with many intrinsically disordered regions exhibit increased η values. Additionally, it was found that extremely long sequences with low η are rare. We hypothesize that this arises from varied duplication and mutation frequencies across different evolutionary stages, which in turn suggests a zigzag pattern for the evolution of protein complexity. This is supported by simulations and studies of protein families such as ubiquitin and NBPF, implying species-specific or environment-influenced protein elongation strategies. The ladderpath approach offers a quantitative lens to understand evolutionary tinkering and reuse, shedding light on the generative aspects of biological structures.
UR - http://www.scopus.com/inward/record.url?scp=85190349594&partnerID=8YFLogxK
U2 - 10.1103/PhysRevResearch.6.023215
DO - 10.1103/PhysRevResearch.6.023215
M3 - Journal article
AN - SCOPUS:85190349594
SN - 2643-1564
VL - 6
JO - Physical Review Research
JF - Physical Review Research
IS - 2
M1 - 023215
ER -