CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification

Yuchen Tian, Weixiang Yan*, Qian Yang, Xuandong Zhao, Qian Chen, Wen Wang, Ziyang Luo, Lei Ma*, Dawn Song*

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

Large Language Models (LLMs) have made significant progress in code generation, offering developers groundbreaking automated programming support. However, LLMs often generate code that is syntactically correct and even semantically plausible, but may not execute as expected or fulfill specified requirements. This phenomenon of hallucinations in the code domain has not been systematically explored. To advance the community’s understanding and research on this issue, we introduce the concept of code hallucinations and propose a classification method for code hallucination based on execution verification. We categorize code hallucinations into four main types: mapping, naming, resource, and logic hallucinations, with each category further divided into different subcategories to understand and address the unique challenges faced by LLMs in code generation with finer granularity. Additionally, we present a dynamic detection algorithm called CodeHalu designed to detect and quantify code hallucinations. We also introduce the CodeHaluEval benchmark, which includes 8,883 samples from 699 tasks, to systematically and quantitatively evaluate code hallucinations. By evaluating 17 popular LLMs using this benchmark, we reveal significant differences in their accuracy and reliability in code generation, offering detailed insights for further improving the code generation capabilities of LLMs. The CodeHalu benchmark and code are publicly available at https://github.com/yuchen814/CodeHalu.

Original languageEnglish
Title of host publicationProceedings of the 39th AAAI Conference on Artificial Intelligence, AAAI 2025
PublisherAAAI press
Pages25300-25308
Number of pages9
ISBN (Print)157735897X, 9781577358978
DOIs
Publication statusPublished - 11 Apr 2025
Event39th AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States
Duration: 25 Feb 20254 Mar 2025
https://ojs.aaai.org/index.php/AAAI/issue/archive (Conference Proceedings)

Publication series

NameProceedings of the AAAI Conference on Artificial Intelligence
PublisherAssociation for the Advancement of Artificial Intelligence
Number24
Volume39
ISSN (Print)2159-5399
ISSN (Electronic)2374-3468

Conference

Conference39th AAAI Conference on Artificial Intelligence, AAAI 2025
Country/TerritoryUnited States
CityPhiladelphia
Period25/02/254/03/25
Internet address

Fingerprint

Dive into the research topics of 'CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification'. Together they form a unique fingerprint.

Cite this