Abstract
''Knowledge should not be accessible only to those
who can pay" said Robert May, chair of UC's faculty Academic Senate.
Similarly, machine learning should not be accessible only to those who
can pay. Thus, machine learning should benefit to the whole world,
especially for developing countries in Africa and Asia. When dataset
sizes grow bigger, it is laborious and expensive to obtain clean
supervision, especially for developing countries. As a result, the
volume of noisy supervision becomes enormous, e.g., web-scale image and
speech data with noisy labels. However, standard machine learning
assumes that the supervised information is fully clean and intact.
Therefore, noisy data harms the performance of most of the standard
learning algorithms, and sometimes even makes existing algorithms broken
down.
There are bunch of theories and
approaches proposed to deal with noisy data. As far as we know, learning
and mining with noisy labels spans over two important ages in machine
learning, data mining and knowledge management community: statistical
learning (i.e., shallow learning) and deep learning. In the age of
statistical learning, learning and mining with noisy labels focused on
designing noise-tolerant losses or unbiased risk estimators.
Nonetheless, in the age of deep learning, learning and mining with noisy
labels has more options to combat with noisy labels, such as designing
biased risk estimators or leveraging memorization effects of deep
networks. In this tutorial, we summarize the foundations and go through
the most recent noisy-label-tolerant techniques. By participating the
tutorial, the audience will gain a broad knowledge of learning and
mining with noisy labels from the viewpoint of statistical learning
theory, deep learning, detailed analysis of typical algorithms and
frameworks, and their real-world data mining applications.
Original language | English |
---|---|
Title of host publication | CIKM 2022 - Proceedings of the 31st ACM International Conference on Information and Knowledge Management |
Editors | Mohammad Al Hasan, Li Xiong |
Publisher | Association for Computing Machinery (ACM) |
Pages | 5152-5155 |
Number of pages | 4 |
ISBN (Print) | 9781450392365 |
DOIs | |
Publication status | Published - 17 Oct 2022 |
Event | 31st ACM International Conference on Information and Knowledge Management, CIKM 2022 - Atlanta, United States Duration: 17 Oct 2022 → 21 Oct 2022 https://dl.acm.org/doi/proceedings/10.1145/3511808 |
Publication series
Name | International Conference on Information and Knowledge Management, Proceedings |
---|
Conference
Conference | 31st ACM International Conference on Information and Knowledge Management, CIKM 2022 |
---|---|
Country/Territory | United States |
City | Atlanta |
Period | 17/10/22 → 21/10/22 |
Internet address |
Scopus Subject Areas
- General Business,Management and Accounting
- General Decision Sciences
User-Defined Keywords
- data mining
- machine learning
- noisy labels