A Two-Stage Multi-Modal LLM Fine-Tuning Framework for Analyzing Building Surface Defects

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

Building surface defect detection plays a crucial role in structural health monitoring, ensuring the safety and aesthetics of buildings. Recently, Visual Question Answering (VQA) has been promising in architecture, especially for inspection automation and employee training. However, the insufficient pre-training on architectural knowledge and the limited defect detection accuracy of Large Multi-modal Models (LMMs) result in poor performance in multi-modal building surface defect analysis. Therefore, this paper proposes a two-stage fine-tuning framework for improving LMMs' performance in this task. Experiment results show that our framework significantly enhances the Visual Question Answering performance in the building surface defect analysis. Furthermore, our framework enhances the defect detection accuracy compared to conventional fine-tuning approaches, which leads to more accurate and reliable multi-modal analysis responses from the LMMs.
Original languageEnglish
Title of host publication2025 33rd European Signal Processing Conference (EUSIPCO)
PublisherIEEE
Pages706-710
Number of pages5
ISBN (Electronic)9789464593624
ISBN (Print)9798350391831
DOIs
Publication statusPublished - Sept 2025
Event33rd European Signal Processing Conference, EUSIPCO 2025 - Palermo, Italy
Duration: 8 Sept 202512 Sept 2025
https://ieeexplore.ieee.org/xpl/conhome/11225917/proceeding (Conference proceedings)

Publication series

NameEuropean Signal Processing Conference (EUSIPCO)

Conference

Conference33rd European Signal Processing Conference, EUSIPCO 2025
Country/TerritoryItaly
CityPalermo
Period8/09/2512/09/25
Internet address

User-Defined Keywords

  • Computer Vision
  • Large Multi-Modal Model
  • Fine-Tuning
  • Prompt Engineering
  • Defect Detection

Fingerprint

Dive into the research topics of 'A Two-Stage Multi-Modal LLM Fine-Tuning Framework for Analyzing Building Surface Defects'. Together they form a unique fingerprint.

Cite this