TY - JOUR
T1 - Identifying a hierarchy of bipartite subgraphs for web site abstraction
AU - Cheung, Kwok Wai
AU - Sun, Yuxiang
N1 - Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2007
Y1 - 2007
N2 - The Web is transforming from a merely information dissemination platform towards a distributed knowledge-based platform for supporting complex problem solving. However, the existing Web contains a large amount of knowledge which is only tagged using layout related markups, making them hard to be discovered and used. In this paper, we purpose to model semantic-rich and self-contained knowledge units embedded in a web site as a mixture of bipartite sub-graphs and to extract the subgraphs as the web site abstraction via hyperlink structure and file hierarchy analysis. A recursive algorithm, named ReHITS, is derived which can identify bipartite sub-graphs with a hierarchical organization. Each identified sub-graph contains a set of associated authorities and hubs as its summarized semantic description. The effectiveness of the algorithm has been evaluated using three real web sites (containing ∼ 10000 web pages) with promising results. Detailed interpretation of the experimental results and qualitative comparison with other related work are also included.
AB - The Web is transforming from a merely information dissemination platform towards a distributed knowledge-based platform for supporting complex problem solving. However, the existing Web contains a large amount of knowledge which is only tagged using layout related markups, making them hard to be discovered and used. In this paper, we purpose to model semantic-rich and self-contained knowledge units embedded in a web site as a mixture of bipartite sub-graphs and to extract the subgraphs as the web site abstraction via hyperlink structure and file hierarchy analysis. A recursive algorithm, named ReHITS, is derived which can identify bipartite sub-graphs with a hierarchical organization. Each identified sub-graph contains a set of associated authorities and hubs as its summarized semantic description. The effectiveness of the algorithm has been evaluated using three real web sites (containing ∼ 10000 web pages) with promising results. Detailed interpretation of the experimental results and qualitative comparison with other related work are also included.
KW - HITS algorithm
KW - Knowledge discovery
KW - Web site abstraction
KW - Web structure mining
UR - https://content.iospress.com/articles/web-intelligence-and-agent-systems-an-international-journal/wia00120
UR - http://www.scopus.com/inward/record.url?scp=35348927558&partnerID=8YFLogxK
M3 - Journal article
AN - SCOPUS:35348927558
SN - 1570-1263
VL - 5
SP - 343
EP - 355
JO - Web Intelligence and Agent Systems
JF - Web Intelligence and Agent Systems
IS - 3
ER -