Skip to main navigation Skip to search Skip to main content

Safety in Large Reasoning Models: A Survey

  • Cheng Wang
  • , Yue Liu
  • , Baolong Bi
  • , Duzhen Zhang
  • , Zhong Zhi Li
  • , Yingwei Ma
  • , Yufei He
  • , Shengju Yu
  • , Xinfeng Li
  • , Junfeng Fang*
  • , Jiaheng Zhang
  • , Bryan Hooi
  • *Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

1 Citation (Scopus)

Abstract

Large Reasoning Models (LRMs) have exhibited extraordinary prowess in tasks like mathematics and coding, leveraging their advanced reasoning capabilities. Nevertheless, as these capabilities progress, significant concerns regarding their vulnerabilities and safety have arisen, which can pose challenges to their deployment and application in real-world settings. This paper presents the first comprehensive survey of LRMs, meticulously exploring and summarizing the newly emerged safety risks, attacks, and defense strategies specific to these powerful reasoning-enhanced models. By organizing these elements into a detailed taxonomy, this work aims to offer a clear and structured understanding of the current safety landscape of LRMs, facilitating future research and development to enhance the security and reliability of these powerful models.

Original languageEnglish
Title of host publicationEMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025
EditorsChristos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
PublisherAssociation for Computational Linguistics (ACL)
Pages3468-3482
Number of pages15
ISBN (Electronic)9798891763357
DOIs
Publication statusPublished - Nov 2025
Event30th Conference on Empirical Methods in Natural Language Processing - Suzhou, China
Duration: 4 Nov 20259 Nov 2025
https://aclanthology.org/volumes/2025.findings-emnlp/ (Conference Proceedings)
https://underline.io/events/502/reception (Conference website)

Publication series

NameEMNLP - Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP

Conference

Conference30th Conference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP 2025
Country/TerritoryChina
CitySuzhou
Period4/11/259/11/25
Internet address

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 9 - Industry, Innovation, and Infrastructure
    SDG 9 Industry, Innovation, and Infrastructure

Fingerprint

Dive into the research topics of 'Safety in Large Reasoning Models: A Survey'. Together they form a unique fingerprint.

Cite this