Learning and Mining with Noisy Labels

Masashi Sugiyama, Tongliang Liu, Bo Han, Yang Liu, Gang Niu

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

1 Citation (Scopus)

Abstract

''Knowledge should not be accessible only to those who can pay" said Robert May, chair of UC's faculty Academic Senate. Similarly, machine learning should not be accessible only to those who can pay. Thus, machine learning should benefit to the whole world, especially for developing countries in Africa and Asia. When dataset sizes grow bigger, it is laborious and expensive to obtain clean supervision, especially for developing countries. As a result, the volume of noisy supervision becomes enormous, e.g., web-scale image and speech data with noisy labels. However, standard machine learning assumes that the supervised information is fully clean and intact. Therefore, noisy data harms the performance of most of the standard learning algorithms, and sometimes even makes existing algorithms broken down.

There are bunch of theories and approaches proposed to deal with noisy data. As far as we know, learning and mining with noisy labels spans over two important ages in machine learning, data mining and knowledge management community: statistical learning (i.e., shallow learning) and deep learning. In the age of statistical learning, learning and mining with noisy labels focused on designing noise-tolerant losses or unbiased risk estimators. Nonetheless, in the age of deep learning, learning and mining with noisy labels has more options to combat with noisy labels, such as designing biased risk estimators or leveraging memorization effects of deep networks. In this tutorial, we summarize the foundations and go through the most recent noisy-label-tolerant techniques. By participating the tutorial, the audience will gain a broad knowledge of learning and mining with noisy labels from the viewpoint of statistical learning theory, deep learning, detailed analysis of typical algorithms and frameworks, and their real-world data mining applications.
Original languageEnglish
Title of host publicationCIKM 2022 - Proceedings of the 31st ACM International Conference on Information and Knowledge Management
EditorsMohammad Al Hasan, Li Xiong
PublisherAssociation for Computing Machinery (ACM)
Pages5152-5155
Number of pages4
ISBN (Print)9781450392365
DOIs
Publication statusPublished - 17 Oct 2022
Event31st ACM International Conference on Information and Knowledge Management, CIKM 2022 - Atlanta, United States
Duration: 17 Oct 202221 Oct 2022
https://dl.acm.org/doi/proceedings/10.1145/3511808

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference31st ACM International Conference on Information and Knowledge Management, CIKM 2022
Country/TerritoryUnited States
CityAtlanta
Period17/10/2221/10/22
Internet address

Scopus Subject Areas

  • General Business,Management and Accounting
  • General Decision Sciences

User-Defined Keywords

  • data mining
  • machine learning
  • noisy labels

Fingerprint

Dive into the research topics of 'Learning and Mining with Noisy Labels'. Together they form a unique fingerprint.

Cite this