Abstract
Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated remarkable performance in various code-related tasks. However, different from their counterparts in the general language modeling field, the technique of instruction fine-tuning remains relatively under-researched in this domain. In this paper, we present Code Evol-Instruct, a novel approach that adapts the Evol-Instruct method to the realm of code, enhancing Code LLMs to create novel models WizardCoder. Through comprehensive experiments on five prominent code generation benchmarks, namely HumanEval, HumanEval+, MBPP, DS-1000, and MultiPL-E, our models showcase outstanding performance. They consistently outperform all other open-source Code LLMs by a significant margin. Remarkably, WizardCoder 15B even surpasses the well-known closed-source LLMs, including Anthropic's Claude and Google's Bard, on the HumanEval and HumanEval+ benchmarks. Additionally, WizardCoder 34B not only achieves a HumanEval score comparable to GPT3.5 (ChatGPT) but also surpasses it on the HumanEval+ benchmark. Furthermore, our preliminary exploration highlights the pivotal role of instruction complexity in achieving exceptional coding performance.
Original language | English |
---|---|
Title of host publication | Proceedings of the Twelfth International Conference on Learning Representations, ICLR 2024 |
Publisher | International Conference on Learning Representations |
Pages | 1-21 |
Number of pages | 21 |
Publication status | Published - May 2024 |
Event | 12th International Conference on Learning Representations, ICLR 2024 - Messe Wien Exhibition and Congress Center, Vienna, Austria Duration: 7 May 2024 → 11 May 2024 https://iclr.cc/Conferences/2024 (Conference website) https://iclr.cc/virtual/2024/calendar (Conference schedule ) https://openreview.net/group?id=ICLR.cc/2024/Conference#tab-accept-oral (Conference proceedings) |
Publication series
Name | Proceedings of the International Conference on Learning Representations, ICLR |
---|
Conference
Conference | 12th International Conference on Learning Representations, ICLR 2024 |
---|---|
Country/Territory | Austria |
City | Vienna |
Period | 7/05/24 → 11/05/24 |
Internet address |
|
Scopus Subject Areas
- Language and Linguistics
- Computer Science Applications
- Education
- Linguistics and Language