Identifying a hierarchy of bipartite subgraphs for web site abstraction

Kwok Wai Cheung*, Yuxiang Sun

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

11 Citations (Scopus)


The Web is transforming from a merely information dissemination platform towards a distributed knowledge-based platform for supporting complex problem solving. However, the existing Web contains a large amount of knowledge which is only tagged using layout related markups, making them hard to be discovered and used. In this paper, we purpose to model semantic-rich and self-contained knowledge units embedded in a web site as a mixture of bipartite sub-graphs and to extract the subgraphs as the web site abstraction via hyperlink structure and file hierarchy analysis. A recursive algorithm, named ReHITS, is derived which can identify bipartite sub-graphs with a hierarchical organization. Each identified sub-graph contains a set of associated authorities and hubs as its summarized semantic description. The effectiveness of the algorithm has been evaluated using three real web sites (containing ∼ 10000 web pages) with promising results. Detailed interpretation of the experimental results and qualitative comparison with other related work are also included.

Original languageEnglish
Pages (from-to)343-355
Number of pages13
JournalWeb Intelligence and Agent Systems
Issue number3
Publication statusPublished - 2007

Scopus Subject Areas

  • Software
  • Computer Networks and Communications
  • Artificial Intelligence

User-Defined Keywords

  • HITS algorithm
  • Knowledge discovery
  • Web site abstraction
  • Web structure mining


Dive into the research topics of 'Identifying a hierarchy of bipartite subgraphs for web site abstraction'. Together they form a unique fingerprint.

Cite this