Abstract
Entity Resolution (ER), a crucial task in data integration and cleaning, identifies records from different sources that refer to the same entity. Despite advances in existing state-of-the-art (SOTA) methods, a significant challenge remains largely unaddressed: the quality of the data used for training these models. This paper shifts the focus from model-centric improvements to data-centric enhancements, identifying two primary data quality issues: (1) low-hardness positive samples and (2) the scarcity of positive samples relative to negative samples. To tackle these challenges, we propose a novel Data Quality Enhancement Framework (DQEF) comprising an adversarial hardness enhancement method and a noise perturbation-based positive sample generation method. DQEF generates informative and hard positive samples, improving the model’s training effectiveness. As a model-agnostic approach, it can seamlessly integrate with mainstream ER models to enhance performance. Experiments on multiple ER models and real-world datasets show that DQEF significantly boosts performance, surpassing current SOTA results.
| Original language | English |
|---|---|
| Title of host publication | 2025 International Joint Conference on Neural Networks (IJCNN) |
| Publisher | IEEE |
| Pages | 1-8 |
| Number of pages | 8 |
| ISBN (Electronic) | 9798331510428 |
| ISBN (Print) | 9798331510435 |
| DOIs | |
| Publication status | Published - Jun 2025 |
| Event | 2025 International Joint Conference on Neural Networks, IJCNN 2025 - Rome, Italy Duration: 30 Jun 2025 → 5 Jul 2025 https://ieeexplore.ieee.org/xpl/conhome/11227166/proceeding (Conference proceedings) |
Publication series
| Name | International Joint Conference on Neural Networks (IJCNN) |
|---|---|
| ISSN (Print) | 2161-4393 |
| ISSN (Electronic) | 2161-4407 |
Conference
| Conference | 2025 International Joint Conference on Neural Networks, IJCNN 2025 |
|---|---|
| Country/Territory | Italy |
| City | Rome |
| Period | 30/06/25 → 5/07/25 |
| Internet address |
|
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 9 Industry, Innovation, and Infrastructure
User-Defined Keywords
- entity resolution
- data quality
- hardness enhancement
- positive sample generation
- noise perturbation
Fingerprint
Dive into the research topics of 'Unveiling Hidden Gems: Enhancing Entity Resolution with a Data Perspective'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver