TY - GEN
T1 - DigestJoin
T2 - 2009 10th International Conference on Mobile Data Management: Systems, Services and Middleware, MDM 2009
AU - Li, Yu
AU - On, Sai Tung
AU - XU, Jianliang
AU - CHOI, Koon Kau
AU - HU, Haibo
N1 - Copyright:
Copyright 2009 Elsevier B.V., All rights reserved.
PY - 2009
Y1 - 2009
N2 - Flash disks have been an emerging secondary storage media. In particular, there have been portable devices, multimedia players and laptop computers that are configured with no magnetic disks but flash disks. It is envisioned that some RDBMSs will operate on flash disks in the near future. However, the I/O characteristics of flash disks are different from those of magnetic disks. Thus, in this paper, we study the core of query processing in RDBMSs- join processing- on flash disks. Specifically, we propose a new join method, called DigestJoin, to exploit fast random reads of flash disks. DigestJoin consists of two phases: (1) projecting the join attributes followed by a join on the projected attributes; and (2) fetching the full tuples that satisfy the join to produce the final join results. While the problem of tuple/page fetching with minimum I/O cost (in the second phase) is intractable, we propose three heuristic fetching strategies. We have implemented DigestJoin on a real flash disk for performance evaluation. Experiments on TPC-H datasets show that DigestJoin clearly outperforms the traditional sort-merge join under various system configurations.
AB - Flash disks have been an emerging secondary storage media. In particular, there have been portable devices, multimedia players and laptop computers that are configured with no magnetic disks but flash disks. It is envisioned that some RDBMSs will operate on flash disks in the near future. However, the I/O characteristics of flash disks are different from those of magnetic disks. Thus, in this paper, we study the core of query processing in RDBMSs- join processing- on flash disks. Specifically, we propose a new join method, called DigestJoin, to exploit fast random reads of flash disks. DigestJoin consists of two phases: (1) projecting the join attributes followed by a join on the projected attributes; and (2) fetching the full tuples that satisfy the join to produce the final join results. While the problem of tuple/page fetching with minimum I/O cost (in the second phase) is intractable, we propose three heuristic fetching strategies. We have implemented DigestJoin on a real flash disk for performance evaluation. Experiments on TPC-H datasets show that DigestJoin clearly outperforms the traditional sort-merge join under various system configurations.
UR - http://www.scopus.com/inward/record.url?scp=70349525347&partnerID=8YFLogxK
U2 - 10.1109/MDM.2009.26
DO - 10.1109/MDM.2009.26
M3 - Conference contribution
AN - SCOPUS:70349525347
SN - 9780769536507
T3 - Proceedings - IEEE International Conference on Mobile Data Management
SP - 152
EP - 161
BT - Proceedings - 2009 10th International Conference on Mobile Data Management
Y2 - 18 May 2009 through 20 May 2009
ER -