Abstract
We present AlphaApollo, a self-evolving agentic reasoning system that targets two bottlenecks in foundation-model reasoning: (1) limited capacity for long-horizon, multi-step problem solving and (2) unreliable test-time refinement without trustworthy verification. AlphaApollo orchestrates models and tools via three components: (i) multi-turn agentic reasoning, which formalizes model-environment interaction with structured tool calls and responses; (ii) multi-turn agentic learning, which applies turn-level reinforcement learning to optimize tool-use decisions while decoupling actions from tool responses for stable training; and (iii) multi-round agentic evolution, which refines solutions through a propose-judge-update loop with tool-assisted verifications and long-horizon memory. Across seven math reasoning benchmarks and multiple model scales, AlphaApollo improves performance through reliable tool use (>85% tool-call success), substantial gains from multi-turn RL (Avg@32: Qwen2.5-1.5B-Instruct 1.07% → 9.64%, Qwen2.5-7B-Instruct 8.77% → 20.35%), and improvements from evolution (e.g., Qwen2.5-3B-Instruct 5.27% → 7.70%, Qwen2.5-14B-Instruct 16.53% → 21.08%). The code is available at https://github.com/tmlr-group/AlphaApollo.
| Original language | English |
|---|---|
| Title of host publication | ICLR 2026 Workshop on AI with Recursive Self-Improvement |
| Publisher | International Conference on Learning Representations, ICLR |
| Pages | 1-44 |
| Number of pages | 44 |
| Publication status | Published - 26 Apr 2026 |
| Event | ICLR 2026 Workshop on AI with Recursive Self-Improvement - Rio de Janeiro, Brazil Duration: 26 Apr 2026 → 26 Apr 2026 https://openreview.net/group?id=ICLR.cc/2026/Workshop/RSI |
Publication series
| Name | International Conference on Learning Representations Workshop |
|---|
Workshop
| Workshop | ICLR 2026 Workshop on AI with Recursive Self-Improvement |
|---|---|
| Country/Territory | Brazil |
| City | Rio de Janeiro |
| Period | 26/04/26 → 26/04/26 |
| Internet address |
Fingerprint
Dive into the research topics of 'AlphaApollo: A System for Deep Agentic Reasoning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver