Malicious data manipulation reduces the effectiveness of machine learning techniques, which rely on accurate knowledge of the input data. Motivated by real-world applications in network flow classification, we address the problem of robust online learning with delayed feedback in the presence of malicious data generators that attempt to gain favorable classification outcome by manipulating the data features. When the feedback delay is static, we propose online algorithms termed ROLC-NC and ROLC-C when the malicious data generators are non-clairvoyant and clairvoyant, respectively. We then consider the dynamic delay case, for which we propose online algorithms termed ROLC-NC-D and ROLC-C-D when the malicious data generators are non-clairvoyant and clairvoyant, respectively. We derive regret bounds for these four algorithms and show that they are sub-linear under mild conditions. We further evaluate the proposed algorithms in network flow classification via extensive experiments using real-world data traces. Our experimental results demonstrate that the proposed algorithms can approach the performance of an optimal static offline classifier that is not under attack, while outperforming the same offline classifier when tested with a mixture of normal and manipulated data.
Scopus Subject Areas
- Computer Networks and Communications
- Electrical and Electronic Engineering
- feedback delay
- Malicious manipulation
- network flow classification
- robust online learning