MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems

Kaixin Li, Yuchen Tian, Qisheng Hu, Ziyang Luo, Zhiyong Huang*, Jing Ma*

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models have demonstrated remarkable abilities in visual reasoning and mathematical tasks, there is little work on investigating whether these models can effectively interpret visual elements for code generation. To this end, we present MMCode, the first multimodal coding dataset for evaluating algorithmic problem-solving skills in visually rich contexts. MMCode contains 3,548 questions and 6,620 images collected from real-world programming challenges harvested from 10 code competition websites, presenting significant challenges due to the extreme demand for reasoning abilities. Our experiment results show that current state-of-the-art models struggle to solve these problems. The results highlight the lack of powerful vision-code models, and we hope MMCode can serve as an inspiration for future works in this domain. The data and code are publicly available.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics: EMNLP 2024
EditorsYaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
PublisherAssociation for Computational Linguistics (ACL)
Pages736-783
Number of pages48
ISBN (Electronic)9798891761681
DOIs
Publication statusPublished - Nov 2024
Event2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 - Hybrid, Miami, United States
Duration: 12 Nov 202416 Nov 2024
https://aclanthology.org/volumes/2024.emnlp-main/ (Conference proceedings)

Publication series

NameConference on Empirical Methods in Natural Language Processing, Findings of EMNLP

Conference

Conference2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024
Country/TerritoryUnited States
CityMiami
Period12/11/2416/11/24
Internet address

Scopus Subject Areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems'. Together they form a unique fingerprint.

Cite this