Skip to main navigation Skip to search Skip to main content

JailbreakLoRA: Your Downloaded LoRA from Sharing Platforms might be Unsafe

  • Fanjunduo Wei (Co-first author)
  • , Zhenheng Tang (Co-first author)
  • , Rongfei Zeng* (Co-first author)
  • , Tongliang Liu
  • , Chengqi Zhang
  • , Xiaowen Chu
  • , Bo Han
  • *Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

Low-Rank Adaptation (LoRA) benefits from its plug-and-play nature, enabling large language models (LLMs) to achieve significant performance gains at low cost, has driven the development of LoRA-sharing platforms. However, the jailbreak and backdoor concerns associated with LoRA-sharing platforms remain underexplored. Existing LoRA-based attacks primarily focus on achieving high attack success rates, while neglecting the core reason why LoRA is adopted by user, i.e. to gain downstream task capabilities. However, achieving effective attacks while preserving strong multi-task performance remains challenging, as the largely unrelated objectives tend to interfere with each other during optimization. In this paper, we propose JailbreakLoRA, a multi-task jailbreak LoRA training method that balances task utility and attack capability, it resolves training interference by uncertainty-weighting losses and mitigating gradient conflicts. Additionally, JailbreakLoRA is designed to generate an affirmative prefix upon trigger activation, exploiting inference-time hallucinations to enhance the effectiveness of jailbreak. Experimental results demonstrate that our method outperforms SOTA LoRA-based attacks, achieving a 16.0% improvement in attack success rate while also enhancing performance on multi-downstream tasks by 16.5% in average.
Original languageEnglish
Title of host publicationThe Fourteenth International Conference on Learning Representations, ICLR 2026
PublisherInternational Conference on Learning Representations, ICLR
Pages1-27
Number of pages27
Publication statusPublished - 26 Jan 2026
Event14th International Conference on Learning Representations, ICLR 2026 - Rio de Janeiro, Brazil
Duration: 23 Apr 202627 Apr 2026
https://iclr.cc/Conferences/2026 (Conference website)
https://openreview.net/group?id=ICLR.cc/2026 (Conference proceedings)
https://iclr.cc/virtual/2026/calendar (Conference schedule)

Publication series

NameInternational Conference on Learning Representations
PublisherInternational Conference on Learning Representations, ICLR

Conference

Conference14th International Conference on Learning Representations, ICLR 2026
Abbreviated titleICLR 2026
Country/TerritoryBrazil
CityRio de Janeiro
Period23/04/2627/04/26
Internet address

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 9 - Industry, Innovation, and Infrastructure
    SDG 9 Industry, Innovation, and Infrastructure

User-Defined Keywords

  • Jailbreak
  • LoRA
  • Large Language Models

Fingerprint

Dive into the research topics of 'JailbreakLoRA: Your Downloaded LoRA from Sharing Platforms might be Unsafe'. Together they form a unique fingerprint.

Cite this