Tree-of-Evolution: Tree-Structured Instruction Evolution for Code Generation in Large Language Models

  • Ziyang Luo
  • , Kaixin Li
  • , Hongzhan Lin
  • , Yuchen Tian
  • , Mohan Kankanhalli
  • , Jing Ma*
  • *Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

1 Citation (Scopus)

Abstract

Data synthesis has become a crucial research area in large language models (LLMs), especially for generating high-quality instruction fine-tuning data to enhance downstream performance. In code generation, a key application of LLMs, manual annotation of code instruction data is costly. Recent methods, such as Code Evol-Instruct and OSS-Instruct, leverage LLMs to synthesize large-scale code instruction data, significantly improving LLM coding capabilities. However, these approaches face limitations due to unidirectional synthesis and randomness-driven generation, which restrict data quality and diversity. To overcome these challenges, we introduce Tree-of-Evolution (ToE), a novel framework that models code instruction synthesis process with a tree structure, exploring multiple evolutionary paths to alleviate the constraints of unidirectional generation. Additionally, we propose optimization-driven evolution, which refines each generation step based on the quality of the previous iteration. Experimental results across five widely-used coding benchmarks—HumanEval, MBPP, EvalPlus, LiveCodeBench, and BigCodeBench—demonstrate that base models fine-tuned on just 75k data synthesized by our method achieve comparable or superior performance to the state-of-the-art open-weight Code LLM, Qwen2.5-Coder-Instruct, which was fine-tuned on millions of samples.
Original languageEnglish
Title of host publicationProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics
EditorsWanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Place of PublicationVienna
PublisherAssociation for Computational Linguistics (ACL)
Pages297–316
Number of pages20
Volume1
ISBN (Electronic)9798891762510
DOIs
Publication statusPublished - Jul 2025
Event63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 - Austria Center Vienna, Vienna, Austria
Duration: 27 Jul 20251 Aug 2025
https://2025.aclweb.org/ (Conference Website)
https://docs.google.com/spreadsheets/d/1O-n3HPvv8vY0L_kjyP5AtRTcWWjqLk2deCYtrMgCGw4/edit?usp=drive_link (Conference Program)
https://aclanthology.org/events/acl-2025/ (Conference Proceedings)

Publication series

NameProceedings of Annual Meeting of the Association for Computational Linguistics
PublisherAssociation for Computational Linguistics

Conference

Conference63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Country/TerritoryAustria
CityVienna
Period27/07/251/08/25
Internet address

Fingerprint

Dive into the research topics of 'Tree-of-Evolution: Tree-Structured Instruction Evolution for Code Generation in Large Language Models'. Together they form a unique fingerprint.

Cite this