Mining local data sources for learning global cluster models

Chak Man Lam*, Xiao Feng Zhang, Kwok Wai CHEUNG

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

4 Citations (Scopus)

Abstract

Distributed data mining has been a topic getting more important nowadays as there are many cases where physically sharing of data is probibited, e.g., due to huge data volume or data privacy. In this paper, we are interested in learning a global cluster model by exploring data in distributed sources. A methodology based on periodic model exchange and merge is proposed and applied to hyperlinked Web pages analysis. In addition, we have tested a number of variations of the basic idea, including putting more emphasis on the privacy concern and testing the effect of having different numbers of distributed sources. Experimental results show that the proposed distributed learning scheme is effective with accuracy close to the case with all the data physically shared for the learning.
Original languageEnglish
Title of host publicationProceedings of IEEE/WIC/ACM International Conference on Web Intelligence, WI 2004
EditorsN. Zhong, H. Tirri, Y. Yao, L. Zhou
PublisherIEEE
Pages748-751
Number of pages4
ISBN (Print)9780769521008, 0769521002
DOIs
Publication statusPublished - Sept 2004
EventIEEE/WIC/ACM International Conference on Web Intelligence, WI 2004 - Beijing, China
Duration: 20 Sept 200424 Sept 2004

Publication series

NameProceedings - IEEE/WIC/ACM International Conference on Web Intelligence, WI 2004

Conference

ConferenceIEEE/WIC/ACM International Conference on Web Intelligence, WI 2004
Country/TerritoryChina
CityBeijing
Period20/09/0424/09/04

Scopus Subject Areas

  • General Engineering

User-Defined Keywords

  • Data mining
  • Machine learning
  • Data analysis
  • Data privacy
  • Web pages
  • Testing
  • Frequency
  • Computer science
  • Machine learning algorithms
  • Training data

Fingerprint

Dive into the research topics of 'Mining local data sources for learning global cluster models'. Together they form a unique fingerprint.

Cite this