TY - JOUR
T1 - Improving Generalizability of Graph Anomaly Detection Models via Data Augmentation
AU - Zhou, Shuang
AU - Huang, Xiao
AU - Liu, Ninghao
AU - Zhou, Huachi
AU - Chung, Fu Lai
AU - Huang, Long Kai
N1 - Publisher Copyright:
© 2023 IEEE.
Funding Information:
This work was supported by the grant of DaSAIL under Grant P0030970 funded by PolyU (UGC).
PY - 2023/12/1
Y1 - 2023/12/1
N2 - Graph anomaly detection (GAD) has wide applications in real-world networked systems. In many scenarios, people need to identify anomalies on new (sub)graphs, but they may lack labels to train an effective detection model. Since recent semi-supervised GAD methods, which can leverage the available labels as prior knowledge, have achieved superior performance than unsupervised methods, one natural idea is to directly adopt a trained semi-supervised GAD model to the new (sub)graphs for testing. However, we find that existing semi-supervised GAD methods suffer from poor generalization issues, i.e., well-trained models could not perform well on an unseen area (i.e., not accessible in training) of the graph. Motivated by this, we formally define the problem of generalized graph anomaly detection that aims to effectively identify anomalies on both the training-domain graph(s) and the unseen test graph(s). Nevertheless, it is a challenging task since only limited labels are available, and the normal data distribution may differ between training and testing data. Accordingly, we propose a data augmentation method named AugAN (Augmentation for Anomaly and Normal distributions) to enrich training data and adopt a customized episodic training strategy for learning with the augmented data. Extensive experiments verify the effectiveness of AugAN in improving model generalizability.
AB - Graph anomaly detection (GAD) has wide applications in real-world networked systems. In many scenarios, people need to identify anomalies on new (sub)graphs, but they may lack labels to train an effective detection model. Since recent semi-supervised GAD methods, which can leverage the available labels as prior knowledge, have achieved superior performance than unsupervised methods, one natural idea is to directly adopt a trained semi-supervised GAD model to the new (sub)graphs for testing. However, we find that existing semi-supervised GAD methods suffer from poor generalization issues, i.e., well-trained models could not perform well on an unseen area (i.e., not accessible in training) of the graph. Motivated by this, we formally define the problem of generalized graph anomaly detection that aims to effectively identify anomalies on both the training-domain graph(s) and the unseen test graph(s). Nevertheless, it is a challenging task since only limited labels are available, and the normal data distribution may differ between training and testing data. Accordingly, we propose a data augmentation method named AugAN (Augmentation for Anomaly and Normal distributions) to enrich training data and adopt a customized episodic training strategy for learning with the augmented data. Extensive experiments verify the effectiveness of AugAN in improving model generalizability.
KW - Graph anomaly detection
KW - model generalizability
KW - data augmentation
UR - https://www.scopus.com/pages/publications/85159815173
U2 - 10.1109/TKDE.2023.3271771
DO - 10.1109/TKDE.2023.3271771
M3 - Journal article
AN - SCOPUS:85159815173
SN - 1041-4347
VL - 35
SP - 12721
EP - 12735
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 12
ER -