Abstract
Despite efforts to improve the quality of service (QoS) and through-put in Large Language Model (LLM) serving systems, progress is often limited by the lack of publicly available real - world workloads. Consequently, evaluations usually depend on synthetic or oversimplified load patterns, and systems that appear promising in testing frequently underperform once deployed.
This work presents BurstGPT, an LLM serving workload with 10.31 million traces from regional Azure OpenAI GPT services over 213 days. BurstGPT captures LLM serving characteristics from user, model and system perspectives: (1) User request concurrency: burstiness variations of requests in Azure OpenAI GPT services, revealing diversified concurrency patterns in different services and model types. (2) User conversation patterns: counts and intervals within conversations for service optimizations. (3) Model response lengths: auto-regressive serving processes of GPT models, showing statistical relations between requests and their responses. (4) System response failures: failures of conversation and API services, showing intensive resource needs and limited availability of LLM services in Azure. The details of the characteristics can serve multiple purposes in LLM serving optimizations, such as system evaluation and trace provisioning. In our demo evaluation with BurstGPT, frequent variations in BurstGPT reveal declines in efficiency, stability, or reliability in realistic LLM serving. We identify that the generalization of KV cache management, scheduling and disaggregation optimizations can be improved under realistic workload evaluations. BurstGPT is publicly available now at https://github.com/HPMLL/BurstGPT and is widely used to develop prototypes of LLM serving frameworks in the industry.
| Original language | English |
|---|---|
| Title of host publication | KDD '25 |
| Subtitle of host publication | Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining |
| Place of Publication | New York |
| Publisher | Association for Computing Machinery (ACM) |
| Pages | 5831-5841 |
| Number of pages | 11 |
| Volume | 2 |
| ISBN (Electronic) | 9798400714542 |
| DOIs | |
| Publication status | Published - 3 Aug 2025 |
| Event | 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025 - Toronto Convention Centre, Toronto, Canada Duration: 3 Aug 2025 → 7 Aug 2025 https://dl.acm.org/doi/proceedings/10.1145/3690624 (Conference proceeding) https://kdd2025.kdd.org/ (Conference website) https://kdd2025.kdd.org/schedule-at-a-glance/ (Conference schedule) |
Publication series
| Name | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
|---|---|
| ISSN (Print) | 2154-817X |
Conference
| Conference | 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025 |
|---|---|
| Abbreviated title | KDD 2025 |
| Country/Territory | Canada |
| City | Toronto |
| Period | 3/08/25 → 7/08/25 |
| Internet address |
|
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 9 Industry, Innovation, and Infrastructure
User-Defined Keywords
- LLM Serving
- Workload Trace
- Workload Management
- System Scheduling
Fingerprint
Dive into the research topics of 'BurstGPT: A Real-World Workload Dataset to Optimize LLM Serving Systems'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver