Abstract
In open-world environments, classification models should be adept at identifying out-of-distribution (OOD) data whose semantics differ from in-distribution (ID) data, leading to the emerging research in OOD detection. As a promising learning scheme, outlier exposure (OE) enables the models to learn from auxiliary OOD data, enhancing model representations in discerning between ID and OOD patterns. However, these auxiliary OOD data often do not fully represent real OOD scenarios, potentially biasing our models in practical OOD detection. Hence, we propose a novel OE-based learning method termed Wasserstein Distribution-agnostic Outlier Exposure (W-DOE), which is both theoretically sound and experimentally superior to previous works. The intuition is that by expanding the coverage of training-time OOD data, the models will encounter fewer unseen OOD cases upon deployment. In W-DOE, we achieve additional OOD data to enlarge the OOD coverage, based on a new data synthesis approach called implicit data synthesis (IDS). It is driven by our new insight that perturbing model parameters can lead to implicit data transformation, which is simple to implement yet effective to realize. Furthermore, we suggest a general learning framework to search for the synthesized OOD data that can benefit the models most, ensuring the OOD performance for the enlarged OOD coverage measured by the Wasserstein metric. Our approach comes with provable guarantees for open-world settings, demonstrating that broader OOD coverage ensures reduced estimation errors and thereby improved generalization for real OOD cases. We conduct extensive experiments across a series of representative OOD detection setups, further validating the superiority of W-DOE against state-of-the-art counterparts in the field.
Original language | English |
---|---|
Article number | 10844561 |
Pages (from-to) | 1-14 |
Number of pages | 14 |
Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
DOIs | |
Publication status | E-pub ahead of print - 17 Jan 2025 |
Scopus Subject Areas
- Software
- Computer Vision and Pattern Recognition
- Computational Theory and Mathematics
- Artificial Intelligence
- Applied Mathematics
User-Defined Keywords
- Data models
- Learning systems
- Machine learning
- Open-set Learning
- Out-of-distribution Detection
- Perturbation methods
- Predictive models
- Random variables
- Reliability
- Reliable Machine Learning
- Semantics
- Systematics
- Training