Mining Local Data Sources For Learning Global Cluster Models Via Local Model Exchange

Xiaofeng Zhang, Chak Man Lam, William K. Cheung

Research output: Contribution to journalJournal articlepeer-review

Abstract

Distributed data mining has recently caught a lot of attention as there are many cases where pooling distributed data for mining is prohibited, due to either huge data volume or data privacy. In this paper, we addressed the issue of learning a global cluster model, known as the latent class model, by mining distributed data sources. Most of the existing model learning algorithms (e.g., EM) require access to all the available training data. Instead, we studied a methodology based on periodic model exchange and merge, and applied it to Web structure modeling. In addition, we have tested a number of variations of the basic idea, including confining the exchange to some privacy-friendly parameters and varying the number of distributed sources. Experimental results show that the proposed distributed learning scheme is effective with accuracy close to the case with all the data physically shared for the learning. Also, our results show empirically that sharing less model parameters as a further mechanism for privacy control does not result in significant performance degradation for our application.
Original languageEnglish
Pages (from-to)16-22
Number of pages7
JournalIEEE Intelligent Informatics Bulletin
Volume4
Issue number2
Publication statusPublished - Dec 2004

User-Defined Keywords

  • Distributed data mining
  • model-based learning
  • latent class model
  • privacy preservation

Fingerprint

Dive into the research topics of 'Mining Local Data Sources For Learning Global Cluster Models Via Local Model Exchange'. Together they form a unique fingerprint.

Cite this