Abstract
We consider online convex optimization (OCO) with multi-slot feedback delay. An agent selects a sequence of online decisions to minimize the accumulation of time-varying convex loss functions, subject to short-term and long-term constraints that may be time-varying. Both the convex loss function and the long-term constraint function may experience multiple time slots of feedback delay to be received by the agent. Existing works on OCO under this general setting has focused on the static regret, which measures the gap of losses between an online decision sequence and a time-invariant static offline benchmark. In this work, besides the static regret, we also consider a more practically meaningful metric, the dynamic regret, where the benchmark is a time-varying online optimal decision sequence. We propose an efficient algorithm, termed Delay-Tolerant Constrained-OCO (DTC-OCO), which uses a novel double regularization together with a new penalty mechanism on the long-term constraint violation, to tackle the asynchrony between information feedback and decision updates. We obtain upper bounds for its static regret, dynamic regret, and constraint violation, proving that they are sublinear under mild conditions. Furthermore, we consider a variation of DTC-OCO with multi-step gradient descent, and show it provides improved dynamic regret and constraint violation bounds for strongly convex loss functions. For numerical demonstration, we apply DTC-OCO to a general network resource allocation problem. Our simulation results suggest substantial performance gain by DTC-OCO over the current best alternative.
Original language | English |
---|---|
Pages (from-to) | 147-163 |
Number of pages | 17 |
Journal | IEEE/ACM Transactions on Networking |
Volume | 31 |
Issue number | 1 |
Early online date | 15 Jul 2022 |
DOIs | |
Publication status | Published - Feb 2023 |
Scopus Subject Areas
- Software
- Electrical and Electronic Engineering
- Computer Networks and Communications
- Computer Science Applications
User-Defined Keywords
- Online convex optimization
- long-term constraint
- multi-slot delay
- dynamic regret
- constraint violation
- online network resource allocation