Skip to main navigation Skip to search Skip to main content

AlphaApollo: A System for Deep Agentic Reasoning

  • Zhanke Zhou
  • , Chentao Cao
  • , Xiao Feng
  • , Xuan Li
  • , Zongze Li
  • , Xiangyu Lu
  • , Jiangchao Yao
  • , Weikai Huang
  • , Tian Cheng
  • , Jianghangfan Zhang
  • , Tangyu Jiang
  • , Linrui Xu
  • , Yiming Zheng
  • , Brando Miranda
  • , Tongliang Liu
  • , Sanmi Koyejo
  • , Masashi Sugiyama
  • , Bo Han

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

We present AlphaApollo, a self-evolving agentic reasoning system that targets two bottlenecks in foundation-model reasoning: (1) limited capacity for long-horizon, multi-step problem solving and (2) unreliable test-time refinement without trustworthy verification. AlphaApollo orchestrates models and tools via three components: (i) multi-turn agentic reasoning, which formalizes model-environment interaction with structured tool calls and responses; (ii) multi-turn agentic learning, which applies turn-level reinforcement learning to optimize tool-use decisions while decoupling actions from tool responses for stable training; and (iii) multi-round agentic evolution, which refines solutions through a propose-judge-update loop with tool-assisted verifications and long-horizon memory. Across seven math reasoning benchmarks and multiple model scales, AlphaApollo improves performance through reliable tool use (>85% tool-call success), substantial gains from multi-turn RL (Avg@32: Qwen2.5-1.5B-Instruct 1.07% → 9.64%, Qwen2.5-7B-Instruct 8.77% → 20.35%), and improvements from evolution (e.g., Qwen2.5-3B-Instruct 5.27% → 7.70%, Qwen2.5-14B-Instruct 16.53% → 21.08%). The code is available at https://github.com/tmlr-group/AlphaApollo.
Original languageEnglish
Title of host publicationICLR 2026 Workshop on AI with Recursive Self-Improvement
PublisherInternational Conference on Learning Representations, ICLR
Pages1-44
Number of pages44
Publication statusPublished - 26 Apr 2026
EventICLR 2026 Workshop on AI with Recursive Self-Improvement - Rio de Janeiro, Brazil
Duration: 26 Apr 202626 Apr 2026
https://openreview.net/group?id=ICLR.cc/2026/Workshop/RSI

Publication series

NameInternational Conference on Learning Representations Workshop

Workshop

WorkshopICLR 2026 Workshop on AI with Recursive Self-Improvement
Country/TerritoryBrazil
CityRio de Janeiro
Period26/04/2626/04/26
Internet address

Fingerprint

Dive into the research topics of 'AlphaApollo: A System for Deep Agentic Reasoning'. Together they form a unique fingerprint.

Cite this