Mining Web site's clusters from link topology and site hierarchy

Kwok Wai Cheung, Yuxiang Sun

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

4 Citations (Scopus)

Abstract

Foraging information in large and complex Web sites simply using keyword search usually results in unpleasant experience due to the overloaded search results. To support more effective information search, some descriptive abstractions of the Web sites (e.g., sitemaps) are mostly needed. However, their creation and maintenance normally requires recurrent manual effort due to the fast-changing Web contents. We extend the HITS algorithm and integrate hyperlink topology and Web site hierarchy to identify a hierarchy of Web page clusters as the abstraction of a Web site. As the algorithm is based on HITS, each identified cluster follows the bipartite graph structure, with an authority and hub pair as the cluster summary. The effectiveness of the algorithm has been evaluated using three different Web sites (containing ∼6000-14000 Web pages) with promising results. Detailed interpretation of the experimental results as well as qualitative comparison with other related works are also included.

Original languageEnglish
Title of host publicationProceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003
EditorsJiming Liu, Nick Cercone, Matthias Klusch, Chunnian Liu, Ning Zhong
PublisherIEEE
Pages271-277
Number of pages7
ISBN (Electronic)0769519326, 9780769519326
DOIs
Publication statusPublished - 2003
EventIEEE/WIC International Conference on Web Intelligence, WI 2003 - Halifax, Canada
Duration: 13 Oct 200317 Oct 2003

Publication series

NameProceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003

Conference

ConferenceIEEE/WIC International Conference on Web Intelligence, WI 2003
Country/TerritoryCanada
CityHalifax
Period13/10/0317/10/03

Scopus Subject Areas

  • Artificial Intelligence
  • Information Systems
  • Computer Networks and Communications
  • Human-Computer Interaction
  • Information Systems and Management

User-Defined Keywords

  • Algorithm design and analysis
  • Bipartite graph
  • Clustering algorithms
  • Computer science
  • Iterative algorithms
  • Keyword search
  • Search engines
  • Sun
  • Topology
  • Web pages

Fingerprint

Dive into the research topics of 'Mining Web site's clusters from link topology and site hierarchy'. Together they form a unique fingerprint.

Cite this