Abstract
The Web is transforming from a merely information dissemination platform towards a distributed knowledge-based platform for supporting complex problem solving. However, the existing Web contains a large amount of knowledge which is only tagged using layout related markups, making them hard to be discovered and used. In this paper, we purpose to model semantic-rich and self-contained knowledge units embedded in a web site as a mixture of bipartite sub-graphs and to extract the subgraphs as the web site abstraction via hyperlink structure and file hierarchy analysis. A recursive algorithm, named ReHITS, is derived which can identify bipartite sub-graphs with a hierarchical organization. Each identified sub-graph contains a set of associated authorities and hubs as its summarized semantic description. The effectiveness of the algorithm has been evaluated using three real web sites (containing ∼ 10000 web pages) with promising results. Detailed interpretation of the experimental results and qualitative comparison with other related work are also included.
| Original language | English |
|---|---|
| Pages (from-to) | 343-355 |
| Number of pages | 13 |
| Journal | Web Intelligence and Agent Systems |
| Volume | 5 |
| Issue number | 3 |
| Publication status | Published - 2007 |
User-Defined Keywords
- HITS algorithm
- Knowledge discovery
- Web site abstraction
- Web structure mining
Fingerprint
Dive into the research topics of 'Identifying a hierarchy of bipartite subgraphs for web site abstraction'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver