Adaptive key partitioning in distributed stream processing

Gang Liu, Zeting Wang, Amelie Chi Zhou*, Rui Mao

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

1 Citation (Scopus)

Abstract

In stream processing systems, Key Grouping is a commonly employed partitioning scheme for distributing input tuples among parallel instances of stateful operators. With key grouping, tuples shared public keys in the stream are designated to the specific instance responsible for that key. Typically, the implementation of key grouping involves the use of a hash function. While it is convenient and deterministic, it is also known to cause load imbalance between parallel instances, especially in the presence of skewed data streams. Key-Splitting is an effective technique that distributes tasks associated with keys to downstream operators, facilitating load balancing at a relatively low cost. However, overly increasing parallel instances can lead to excessive aggregation costs, becoming a system bottleneck. In this paper, we show the high aggregation cost brought by the Key-Splitting partitioner at different levels of key separation. To address this challenge, we introduce an adaptive Key-Splitting method which controlling the degree of key separation. We propose a partitioner named FlexD, which aims to achieve dynamic adaptation of key separation limits for streaming data. The partitioner employs key grouping to distribute rare keys and dynamic expansion of processing instances to distribute hot keys. We implemented our method on Apache Storm and evaluated it by using real-world and synthetic datasets. Experimental results show that our method achieves a good balance between load balancing and aggregation cost. Moreover, it outperforms existing methods, achieving higher throughput.
Original languageEnglish
Pages (from-to)164-178
Number of pages15
JournalCCF Transactions on High Performance Computing
Volume6
Issue number2
Early online date12 Jan 2024
DOIs
Publication statusPublished - Apr 2024

Scopus Subject Areas

  • Computer Science (miscellaneous)
  • Information Systems
  • Hardware and Architecture
  • Computer Science Applications

User-Defined Keywords

  • Key grouping
  • Load balancing
  • Stream partitioning
  • Stream processing

Fingerprint

Dive into the research topics of 'Adaptive key partitioning in distributed stream processing'. Together they form a unique fingerprint.

Cite this