WizardCoder: Empowering Code Large Language Models With Evol-Instruct

Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma*, Qingwei Lin, Daxin Jiang*

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

6 Citations (Scopus)

Abstract

Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated remarkable performance in various code-related tasks. However, different from their counterparts in the general language modeling field, the technique of instruction fine-tuning remains relatively under-researched in this domain. In this paper, we present Code Evol-Instruct, a novel approach that adapts the Evol-Instruct method to the realm of code, enhancing Code LLMs to create novel models WizardCoder. Through comprehensive experiments on five prominent code generation benchmarks, namely HumanEval, HumanEval+, MBPP, DS-1000, and MultiPL-E, our models showcase outstanding performance. They consistently outperform all other open-source Code LLMs by a significant margin. Remarkably, WizardCoder 15B even surpasses the well-known closed-source LLMs, including Anthropic's Claude and Google's Bard, on the HumanEval and HumanEval+ benchmarks. Additionally, WizardCoder 34B not only achieves a HumanEval score comparable to GPT3.5 (ChatGPT) but also surpasses it on the HumanEval+ benchmark. Furthermore, our preliminary exploration highlights the pivotal role of instruction complexity in achieving exceptional coding performance.

Original languageEnglish
Title of host publicationProceedings of the Twelfth International Conference on Learning Representations, ICLR 2024
PublisherInternational Conference on Learning Representations
Pages1-21
Number of pages21
Publication statusPublished - May 2024
Event12th International Conference on Learning Representations, ICLR 2024 - Messe Wien Exhibition and Congress Center, Vienna, Austria
Duration: 7 May 202411 May 2024
https://iclr.cc/Conferences/2024 (Conference website)
https://iclr.cc/virtual/2024/calendar (Conference schedule )
https://openreview.net/group?id=ICLR.cc/2024/Conference#tab-accept-oral (Conference proceedings)

Publication series

NameProceedings of the International Conference on Learning Representations, ICLR

Conference

Conference12th International Conference on Learning Representations, ICLR 2024
Country/TerritoryAustria
CityVienna
Period7/05/2411/05/24
Internet address

Scopus Subject Areas

  • Language and Linguistics
  • Computer Science Applications
  • Education
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'WizardCoder: Empowering Code Large Language Models With Evol-Instruct'. Together they form a unique fingerprint.

Cite this