Skip to main navigation Skip to search Skip to main content

BurstGPT: A Real-World Workload Dataset to Optimize LLM Serving Systems

  • Yuxin Wang
  • , Yuhan Chen
  • , Zeyu Li
  • , Xueze Kang
  • , Yuchu Fang
  • , Yeju Zhou
  • , Yang Zheng
  • , Zhenheng Tang
  • , Xin He
  • , Rui Guo
  • , Xin Wang
  • , Qiang Wang
  • , Amelie Chi Zhou
  • , Xiaowen Chu*
  • *Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

4 Citations (Scopus)

Abstract

Despite efforts to improve the quality of service (QoS) and through-put in Large Language Model (LLM) serving systems, progress is often limited by the lack of publicly available real - world workloads. Consequently, evaluations usually depend on synthetic or oversimplified load patterns, and systems that appear promising in testing frequently underperform once deployed.

This work presents BurstGPT, an LLM serving workload with 10.31 million traces from regional Azure OpenAI GPT services over 213 days. BurstGPT captures LLM serving characteristics from user, model and system perspectives: (1) User request concurrency: burstiness variations of requests in Azure OpenAI GPT services, revealing diversified concurrency patterns in different services and model types. (2) User conversation patterns: counts and intervals within conversations for service optimizations. (3) Model response lengths: auto-regressive serving processes of GPT models, showing statistical relations between requests and their responses. (4) System response failures: failures of conversation and API services, showing intensive resource needs and limited availability of LLM services in Azure. The details of the characteristics can serve multiple purposes in LLM serving optimizations, such as system evaluation and trace provisioning. In our demo evaluation with BurstGPT, frequent variations in BurstGPT reveal declines in efficiency, stability, or reliability in realistic LLM serving. We identify that the generalization of KV cache management, scheduling and disaggregation optimizations can be improved under realistic workload evaluations. BurstGPT is publicly available now at https://github.com/HPMLL/BurstGPT and is widely used to develop prototypes of LLM serving frameworks in the industry.

Original languageEnglish
Title of host publicationKDD '25
Subtitle of host publicationProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Pages5831-5841
Number of pages11
Volume2
ISBN (Electronic)9798400714542
DOIs
Publication statusPublished - 3 Aug 2025
Event31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025 - Toronto Convention Centre, Toronto, Canada
Duration: 3 Aug 20257 Aug 2025
https://dl.acm.org/doi/proceedings/10.1145/3690624 (Conference proceeding)
https://kdd2025.kdd.org/ (Conference website)
https://kdd2025.kdd.org/schedule-at-a-glance/ (Conference schedule)

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
ISSN (Print)2154-817X

Conference

Conference31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025
Abbreviated titleKDD 2025
Country/TerritoryCanada
CityToronto
Period3/08/257/08/25
Internet address

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 9 - Industry, Innovation, and Infrastructure
    SDG 9 Industry, Innovation, and Infrastructure

User-Defined Keywords

  • LLM Serving
  • Workload Trace
  • Workload Management
  • System Scheduling

Fingerprint

Dive into the research topics of 'BurstGPT: A Real-World Workload Dataset to Optimize LLM Serving Systems'. Together they form a unique fingerprint.

Cite this