Abstract
Statistical data analysis and machine learning heavily rely on error measures for regression, classification, and forecasting. Bregman divergence (BD) is a widely used family of error measures, but it is not robust to outlying observations or high leverage points in large- and high-dimensional datasets. In this paper, we propose a new family of robust Bregman divergences called “robust- BD ” that are less sensitive to data outliers. We explore their suitability for sparse large-dimensional regression models with incompletely specified response variable distributions and propose a new estimate called the “penalized robust- BD estimate” that achieves the same oracle property as ordinary non-robust penalized least-squares and penalized-likelihood estimates. We conduct extensive numerical experiments to evaluate the performance of the proposed penalized robust- BD estimate and compare it with classical approaches, and show that our proposed method improves on existing approaches. Finally, we analyze a real dataset to illustrate the practicality of our proposed method. Our findings suggest that the proposed method can be a useful tool for robust statistical data analysis and machine learning in the presence of outliers and large-dimensional data.
Original language | English |
---|---|
Pages (from-to) | 3361-3411 |
Number of pages | 51 |
Journal | Machine Learning |
Volume | 112 |
Issue number | 9 |
Early online date | 5 Jul 2023 |
DOIs | |
Publication status | Published - Sept 2023 |
Scopus Subject Areas
- Software
- Artificial Intelligence
User-Defined Keywords
- Coordinate descent
- Hypothesis testing
- Loss functions
- Penalization
- Quasi-likelihood