TY - GEN
T1 - Distributed Publish/Subscribe Query Processing on the Spatio-Textual Data Stream
AU - Chen, Zhida
AU - Cong, Gao
AU - Zhang, Zhenjie
AU - Fu, Tom Z.J.
AU - Chen, Lisi
N1 - This work was carried out at the Rapid-Rich Object Search(ROSE) Lab at the Nanyang Technological University, Singapore. The ROSE Lab is supported by the National Research Foundation, Singapore, under its Interactive Digital Media(IDM) Strategic Research Programme. This work is also supported in part by a Tier-l grant (RG 22/15) and a Tier-2 grant (MOE-2016-T2-1-137) awarded by Ministry of Education Singapore, and a grant awarded by Microsoft. Tom Fu and Zhenjie Zhang are supported by the research grant for the Human-Centered Cyber-physical Systems Programme at the Advanced Digital Sciences Center from Singapore's Agency for Science, Technology and Research(A*STAR). They are also partially supported by Science and Technology Planning Project of Guangdong under grant (No. 2015B010131015).
Publisher Copyright:
© 2017 IEEE.
PY - 2017/4/19
Y1 - 2017/4/19
N2 - Huge amount of data with both space and text information, e.g., geo-Tagged tweets, is flooding on the Internet. Such spatio-Textual data stream contains valuable information for millions of users with various interests on different keywords and locations. Publish/subscribe systems enable efficient and effective information distribution by allowing users to register continuous queries with both spatial and textual constraints. However, the explosive growth of data scale and user base has posed challenges to the existing centralized publish/subscribe systems for spatiotextual data streams. In this paper, we propose our distributed publish/subscribe system, called PS2Stream, which digests a massive spatio-Textual data stream and directs the stream to target users with registered interests. Compared with existing systems, PS2Stream achieves a better workload distribution in terms of both minimizing the total amount of workload and balancing the load of workers. To achieve this, we propose a new workload distribution algorithm considering both space and text properties of the data. Additionally, PS2Stream supports dynamic load adjustments to adapt to the change of the workload, which makes PS2Stream adaptive. Extensive empirical evaluation, on commercial cloud computing platform with real data, validates the superiority of our system design and advantages of our techniques on system performance improvement.
AB - Huge amount of data with both space and text information, e.g., geo-Tagged tweets, is flooding on the Internet. Such spatio-Textual data stream contains valuable information for millions of users with various interests on different keywords and locations. Publish/subscribe systems enable efficient and effective information distribution by allowing users to register continuous queries with both spatial and textual constraints. However, the explosive growth of data scale and user base has posed challenges to the existing centralized publish/subscribe systems for spatiotextual data streams. In this paper, we propose our distributed publish/subscribe system, called PS2Stream, which digests a massive spatio-Textual data stream and directs the stream to target users with registered interests. Compared with existing systems, PS2Stream achieves a better workload distribution in terms of both minimizing the total amount of workload and balancing the load of workers. To achieve this, we propose a new workload distribution algorithm considering both space and text properties of the data. Additionally, PS2Stream supports dynamic load adjustments to adapt to the change of the workload, which makes PS2Stream adaptive. Extensive empirical evaluation, on commercial cloud computing platform with real data, validates the superiority of our system design and advantages of our techniques on system performance improvement.
UR - http://www.scopus.com/inward/record.url?scp=85021220628&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2017.154
DO - 10.1109/ICDE.2017.154
M3 - Conference proceeding
AN - SCOPUS:85021220628
SN - 9781509065448
T3 - Proceedings - International Conference on Data Engineering
SP - 1095
EP - 1106
BT - Proceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017
PB - IEEE Computer Society
T2 - 33rd IEEE International Conference on Data Engineering, ICDE 2017
Y2 - 19 April 2017 through 22 April 2017
ER -