Skip to main navigation Skip to search Skip to main content

Unveiling Hidden Gems: Enhancing Entity Resolution with a Data Perspective

  • Hongtao Song
  • , Shuang Zhang
  • , Yuhan Zhao
  • , Qilong Han*
  • *Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

Entity Resolution (ER), a crucial task in data integration and cleaning, identifies records from different sources that refer to the same entity. Despite advances in existing state-of-the-art (SOTA) methods, a significant challenge remains largely unaddressed: the quality of the data used for training these models. This paper shifts the focus from model-centric improvements to data-centric enhancements, identifying two primary data quality issues: (1) low-hardness positive samples and (2) the scarcity of positive samples relative to negative samples. To tackle these challenges, we propose a novel Data Quality Enhancement Framework (DQEF) comprising an adversarial hardness enhancement method and a noise perturbation-based positive sample generation method. DQEF generates informative and hard positive samples, improving the model’s training effectiveness. As a model-agnostic approach, it can seamlessly integrate with mainstream ER models to enhance performance. Experiments on multiple ER models and real-world datasets show that DQEF significantly boosts performance, surpassing current SOTA results.
Original languageEnglish
Title of host publication2025 International Joint Conference on Neural Networks (IJCNN)
PublisherIEEE
Pages1-8
Number of pages8
ISBN (Electronic)9798331510428
ISBN (Print)9798331510435
DOIs
Publication statusPublished - Jun 2025
Event2025 International Joint Conference on Neural Networks, IJCNN 2025 - Rome, Italy
Duration: 30 Jun 20255 Jul 2025
https://ieeexplore.ieee.org/xpl/conhome/11227166/proceeding (Conference proceedings)

Publication series

NameInternational Joint Conference on Neural Networks (IJCNN)
ISSN (Print)2161-4393
ISSN (Electronic)2161-4407

Conference

Conference2025 International Joint Conference on Neural Networks, IJCNN 2025
Country/TerritoryItaly
CityRome
Period30/06/255/07/25
Internet address

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 9 - Industry, Innovation, and Infrastructure
    SDG 9 Industry, Innovation, and Infrastructure

User-Defined Keywords

  • entity resolution
  • data quality
  • hardness enhancement
  • positive sample generation
  • noise perturbation

Fingerprint

Dive into the research topics of 'Unveiling Hidden Gems: Enhancing Entity Resolution with a Data Perspective'. Together they form a unique fingerprint.

Cite this