TY - JOUR
T1 - Dynamic Regret of Quantized Distributed Online Bandit Optimization in Zero-Sum Games
AU - Liao, Lan
AU - Ho, Daniel W.C.
AU - Yuan, Deming
AU - Yu, Zhan
AU - Zhang, Baoyong
AU - Xu, Shengyuan
N1 - This work was supported in part by the National Natural Science Foundation of China under Grant 62373190, Grant 62273181, Grant 62221004, and Grant 12401123; in part by the Postgraduate Research and Practice Innovation Program of Jiangsu Province under Grant KYCX24_0673; in part by the Research Grants Council of Hong Kong Special Administrative Region, China, under Grant CityU 11205724, Grant CityU 11206825, and Grant HKBU 12301424.
Publisher Copyright:
© 2013 IEEE.
PY - 2025/9/16
Y1 - 2025/9/16
N2 - This article investigates the distributed online optimization problem in a zero-sum game between two distinct time-varying multiagent networks. At each iteration, the agents not only communicate with their neighbors but also gather information about agents in the opposing network through a time-varying network, assigning weights accordingly. Moreover, we consider quantized communication and bandit feedback mechanisms, with agents transmitting quantized information and adopting one-point estimators. At each iteration, agents make and submit decisions and then receive the cost function values near their decision points rather than the full cost function information. To guarantee the payoff of each network, we design an algorithm named quantized distributed online bandit optimization in two-network (QDOBO-TN). We use dynamic Nash equilibrium regret to measure the positive payoff discrepancy between the decision sequence produced by Algorithm QDOBO-TN and the Nash equilibrium sequence. Furthermore, we propose a multiepoch version of Algorithm QDOBO-TN. The regret bounds for both algorithms are sublinear with respect to the iteration count T. Finally, we conduct a series of simulation experiments that further validate the effectiveness of the algorithms.
AB - This article investigates the distributed online optimization problem in a zero-sum game between two distinct time-varying multiagent networks. At each iteration, the agents not only communicate with their neighbors but also gather information about agents in the opposing network through a time-varying network, assigning weights accordingly. Moreover, we consider quantized communication and bandit feedback mechanisms, with agents transmitting quantized information and adopting one-point estimators. At each iteration, agents make and submit decisions and then receive the cost function values near their decision points rather than the full cost function information. To guarantee the payoff of each network, we design an algorithm named quantized distributed online bandit optimization in two-network (QDOBO-TN). We use dynamic Nash equilibrium regret to measure the positive payoff discrepancy between the decision sequence produced by Algorithm QDOBO-TN and the Nash equilibrium sequence. Furthermore, we propose a multiepoch version of Algorithm QDOBO-TN. The regret bounds for both algorithms are sublinear with respect to the iteration count T. Finally, we conduct a series of simulation experiments that further validate the effectiveness of the algorithms.
KW - Bandit feedback
KW - distributed online optimization
KW - Nash equilibrium
KW - random quantization
KW - zero-sum game
UR - http://www.scopus.com/inward/record.url?scp=105016754783&partnerID=8YFLogxK
U2 - 10.1109/TCYB.2025.3604774
DO - 10.1109/TCYB.2025.3604774
M3 - Journal article
AN - SCOPUS:105016754783
SN - 2168-2267
JO - IEEE Transactions on Cybernetics
JF - IEEE Transactions on Cybernetics
ER -