TY - JOUR
T1 - A Data Cube Model for Prediction-Based Web Prefetching
AU - Yang, Qiang
AU - Huang, Joshua Zhexue
AU - Ng, Michael
N1 - Publisher Copyright:
© 2002 Kluwer Academic Publishers. Manufactured in The Netherlands.
PY - 2003/1
Y1 - 2003/1
N2 - Reducing the web latency is one of the primary concerns of Internet research. Web caching and web prefetching are two effective techniques to latency reduction. A primary method for intelligent prefetching is to rank potential web documents based on prediction models that are trained on the past web server and proxy server log data, and to prefetch the highly ranked objects. For this method to work well, the prediction model must be updated constantly, and different queries must be answered efficiently. In this paper we present a data-cube model to represent Web access sessions for data mining for supporting the prediction model construction. The cube model organizes session data into three dimensions. With the data cube in place, we apply efficient data mining algorithms for clustering and correlation analysis. As a result of the analysis, the web page clusters can then be used to guide the prefetching system. In this paper, we propose an integrated web-caching and web-prefetching model, where the issues of prefetching aggressiveness, replacement policy and increased network traffic are addressed together in an integrated framework. The core of our integrated solution is a prediction model based on statistical correlation between web objects. This model can be frequently updated by querying the data cube of web server logs. This integrated data cube and prediction based prefetching framework represents a first such effort in our knowledge.
AB - Reducing the web latency is one of the primary concerns of Internet research. Web caching and web prefetching are two effective techniques to latency reduction. A primary method for intelligent prefetching is to rank potential web documents based on prediction models that are trained on the past web server and proxy server log data, and to prefetch the highly ranked objects. For this method to work well, the prediction model must be updated constantly, and different queries must be answered efficiently. In this paper we present a data-cube model to represent Web access sessions for data mining for supporting the prediction model construction. The cube model organizes session data into three dimensions. With the data cube in place, we apply efficient data mining algorithms for clustering and correlation analysis. As a result of the analysis, the web page clusters can then be used to guide the prefetching system. In this paper, we propose an integrated web-caching and web-prefetching model, where the issues of prefetching aggressiveness, replacement policy and increased network traffic are addressed together in an integrated framework. The core of our integrated solution is a prediction model based on statistical correlation between web objects. This model can be frequently updated by querying the data cube of web server logs. This integrated data cube and prediction based prefetching framework represents a first such effort in our knowledge.
KW - data cube
KW - data mining
KW - clustering
KW - transition probability matrices
KW - web prefetching
UR - http://www.scopus.com/inward/record.url?scp=0037261108&partnerID=8YFLogxK
U2 - 10.1023/A:1020990805004
DO - 10.1023/A:1020990805004
M3 - Journal article
AN - SCOPUS:0037261108
SN - 0925-9902
VL - 20
SP - 11
EP - 30
JO - Journal of Intelligent Information Systems
JF - Journal of Intelligent Information Systems
IS - 1
ER -