Skip to main navigation Skip to search Skip to main content

Learning to Evolve: Scaling Open-Ended Discovery with Relative-Progress RL

  • Xuan Li
  • , Zhanke Zhou
  • , Zongze Li
  • , Jiangchao Yao
  • , Bo Han*
  • *Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

Evolution is a promising way for Large Language Models (LLMs) to tackle open-ended problems, such as molecular optimization. Existing training-free methods of evolution rely on context engineering that cannot reliably yield desired solutions. On the other hand, Reinforcement Learning with Verifiable Rewards (RLVR) is a learning-centric alternative, but it prioritizes final solutions over the multi-turn process of evolution, which cannot bring stable improvement. To address this, we propose Learning to Evolve (LtE), which learns a policy for iterative refinement by turning per-turn evaluator scores into turn-wise and trajectory-wise credit assignments. LtE uses (i) a turn-level advantage based on the score improvement over the initial solution and (ii) a trajectory-level advantage that accumulates these improvements over the entire trajectory. These two rewards are combined for credit assignment across turns and across trajectories, aligning the learning with progress improvement across evolution turns. We conduct experiments on molecular optimization tasks. LtE produces higher-quality solutions with the same budgets as training-free and RLVR methods and enables test time scale-up.
Original languageEnglish
Title of host publicationICLR 2026 Workshop on AI with Recursive Self-Improvement
PublisherInternational Conference on Learning Representations, ICLR
Pages1-18
Number of pages18
Publication statusPublished - 26 Apr 2026
EventICLR 2026 Workshop on AI with Recursive Self-Improvement - Rio de Janeiro, Brazil
Duration: 26 Apr 202626 Apr 2026
https://openreview.net/group?id=ICLR.cc/2026/Workshop/RSI

Publication series

NameInternational Conference on Learning Representations Workshop

Workshop

WorkshopICLR 2026 Workshop on AI with Recursive Self-Improvement
Country/TerritoryBrazil
CityRio de Janeiro
Period26/04/2626/04/26
Internet address

Fingerprint

Dive into the research topics of 'Learning to Evolve: Scaling Open-Ended Discovery with Relative-Progress RL'. Together they form a unique fingerprint.

Cite this