TY - JOUR
T1 - Can Large Language Models Be Query Optimizer for Relational Databases?
AU - Tan, Jie
AU - Zhao, Kangfei
AU - Li, Rui
AU - Yu, Jeffrey Xu
AU - Piao, Chengzhi
AU - Cheng, Hong
AU - Meng, Helen
AU - Zhao, Deli
AU - Rong, Yu
N1 - Funding information:
This work is supported by grants from the Research Grants Council of the Hong Kong Special Administrative Region, China (No. CUHK 14217622, 14206625), in part by the Centre for Perceptual and Interactive Intelligence, a CUHK-led InnoCentre under the InnoHK initiative of the Innovation and Technology Commission of the Hong Kong Special Administrative Region Government, and by Damo Academy (Hupan Laboratory) through Damo Academy (Hupan Laboratory) Innovative Research Program. Chengzhi Piao is supported by grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (No. 12202324).
Publisher copyright:
© 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2025/12/5
Y1 - 2025/12/5
N2 - Query optimization is a complex planning and decision-making problem within the exponentially growing plan space in database management systems (DBMS). Traditional optimization techniques have been extensively studied over decades, leaving limited room for further improvement along this track. Recent developments of Large Language Models (LLMs) have demonstrated their potential in solving complex planning and decision-making problems, such as arithmetic and programmatic tasks. In this paper, we try to explore the potential of LLMs in handling query optimization and propose a tentative LLM-based query optimizer dubbed LLM-QO, established on PostgreSQL's execution engine. In LLM-QO, we formulate query optimization in an autoregressive fashion which directly generates the execution plan without explicit plan enumeration. To investigate the essential input of LLM-QO, we design a customized data recipe named QInstruct to collect the training data from various optimizers and serialize the database's meta data, queries and corresponding plans into a textual format. Based on QInstruct, we implement a two-stage fine-tuning pipeline, Query Instruction Tuning (QIT) and Query Direct Preference Optimization (QDPO), to empower the capability of general-purpose LLMs in handling query optimization. In our experiments, LLM-QO can generate valid and high-quality plans and consistently outperforms both traditional and learned optimizers on three query workloads. Our findings verify that LLMs can be derived as query optimizers where generalization, efficiency and adaptivity deserve further research efforts.
AB - Query optimization is a complex planning and decision-making problem within the exponentially growing plan space in database management systems (DBMS). Traditional optimization techniques have been extensively studied over decades, leaving limited room for further improvement along this track. Recent developments of Large Language Models (LLMs) have demonstrated their potential in solving complex planning and decision-making problems, such as arithmetic and programmatic tasks. In this paper, we try to explore the potential of LLMs in handling query optimization and propose a tentative LLM-based query optimizer dubbed LLM-QO, established on PostgreSQL's execution engine. In LLM-QO, we formulate query optimization in an autoregressive fashion which directly generates the execution plan without explicit plan enumeration. To investigate the essential input of LLM-QO, we design a customized data recipe named QInstruct to collect the training data from various optimizers and serialize the database's meta data, queries and corresponding plans into a textual format. Based on QInstruct, we implement a two-stage fine-tuning pipeline, Query Instruction Tuning (QIT) and Query Direct Preference Optimization (QDPO), to empower the capability of general-purpose LLMs in handling query optimization. In our experiments, LLM-QO can generate valid and high-quality plans and consistently outperforms both traditional and learned optimizers on three query workloads. Our findings verify that LLMs can be derived as query optimizers where generalization, efficiency and adaptivity deserve further research efforts.
KW - Learned Query Optimization
KW - Machine Learning for Systems
U2 - 10.1145/3769771
DO - 10.1145/3769771
M3 - Journal article
SN - 2836-6573
VL - 3
JO - Proceedings of the ACM on Management of Data
JF - Proceedings of the ACM on Management of Data
IS - 6
M1 - 306
ER -