Skip to main navigation Skip to search Skip to main content

Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction Tuning

  • Yunhao Gou
  • , Hansi Yang
  • , Zhili Liu
  • , Kai Chen
  • , Yihan Zeng
  • , Lanqing Hong
  • , Zhenguo Li
  • , Qun Liu
  • , Bo Han
  • , James Kwok
  • , Yu Zhang*
  • *Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

Visual Instruction Tuning (VIT) aims to enhance Multimodal Large Language Models (MLLMs), yet its effectiveness is often compromised by corrupted datasets with issues such as hallucinated content, incorrect responses, and poor OCR quality. Previous approaches to address these challenges have focused on refining datasets through high-quality data collection or rule-based filtering that can be costly or limited in scope. In this paper, we conduct a systematic investigation into the impact of corrupted data on MLLMs and discover that, although corrupted data degrade model performance, such adverse effects are largely reversible, and MLLMs are corrupted but not broken. Specifically, we find that disabling a small subset of parameters can almost fully restore performance. Moreover, corrupted MLLMs inherently possess the capability to differentiate between clean and corrupted samples, facilitating dataset cleaning without external intervention. Building on these insights, we introduce a corruption-robust training paradigm that significantly surpasses existing strategies for mitigating the effects of corrupted data.
Original languageEnglish
Title of host publicationProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
EditorsChristos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Place of PublicationKerrville
PublisherAssociation for Computational Linguistics (ACL)
Pages25934-25960
Number of pages27
ISBN (Print)9798891763326
DOIs
Publication statusPublished - 1 Nov 2025
Event30th Conference on Empirical Methods in Natural Language Processing - Suzhou, China
Duration: 4 Nov 20259 Nov 2025
https://aclanthology.org/volumes/2025.findings-emnlp/ (Conference Proceedings)
https://underline.io/events/502/reception (Conference website)

Conference

Conference30th Conference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP 2025
Country/TerritoryChina
CitySuzhou
Period4/11/259/11/25
Internet address

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 9 - Industry, Innovation, and Infrastructure
    SDG 9 Industry, Innovation, and Infrastructure

Fingerprint

Dive into the research topics of 'Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction Tuning'. Together they form a unique fingerprint.

Cite this