Dynamic Regret of Quantized Distributed Online Bandit Optimization in Zero-Sum Games

Lan Liao, Daniel W.C. Ho, Deming Yuan*, Zhan Yu, Baoyong Zhang, Shengyuan Xu

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

This article investigates the distributed online optimization problem in a zero-sum game between two distinct time-varying multiagent networks. At each iteration, the agents not only communicate with their neighbors but also gather information about agents in the opposing network through a time-varying network, assigning weights accordingly. Moreover, we consider quantized communication and bandit feedback mechanisms, with agents transmitting quantized information and adopting one-point estimators. At each iteration, agents make and submit decisions and then receive the cost function values near their decision points rather than the full cost function information. To guarantee the payoff of each network, we design an algorithm named quantized distributed online bandit optimization in two-network (QDOBO-TN). We use dynamic Nash equilibrium regret to measure the positive payoff discrepancy between the decision sequence produced by Algorithm QDOBO-TN and the Nash equilibrium sequence. Furthermore, we propose a multiepoch version of Algorithm QDOBO-TN. The regret bounds for both algorithms are sublinear with respect to the iteration count T. Finally, we conduct a series of simulation experiments that further validate the effectiveness of the algorithms.

Original languageEnglish
Number of pages8
JournalIEEE Transactions on Cybernetics
DOIs
Publication statusE-pub ahead of print - 16 Sept 2025

User-Defined Keywords

  • Bandit feedback
  • distributed online optimization
  • Nash equilibrium
  • random quantization
  • zero-sum game

Fingerprint

Dive into the research topics of 'Dynamic Regret of Quantized Distributed Online Bandit Optimization in Zero-Sum Games'. Together they form a unique fingerprint.

Cite this