TY - GEN
T1 - Towards accurate Histogram publication under differential privacy
AU - Zhang, Xiaojian
AU - Ghent, Rui
AU - XU, Jianliang
AU - Meng, Xiaofeng
AU - Xie, Yingtao
N1 - Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2014
Y1 - 2014
N2 - Histograms arc the workhorse of data mining and analysis. This paper considers the problem of publishing histograms under differential privacy, one of the strongest privacy models. Existing differentially private histogram publication schemes have shown that clustering (or grouping) is a promising idea to improve the accuracy of sanitized histograms. However, none of them fully exploits the benefit of clustering. In this paper, we introduce a new clustering framework. It features a sophisticated evaluation of the trade-off between the approximation error due to clustering and the Laplace error due to Laplace noise injected, which is normally overlooked in prior work. In particular, we propose three clustering strategies with different orders of run-time complexitics. We prove the superiority of our approach by theoretical utility comparisons with the competitors. Our extensive experiments over various standard real-life and synthetic datasets confirm that our technique consistently outperforms existing competitors.
AB - Histograms arc the workhorse of data mining and analysis. This paper considers the problem of publishing histograms under differential privacy, one of the strongest privacy models. Existing differentially private histogram publication schemes have shown that clustering (or grouping) is a promising idea to improve the accuracy of sanitized histograms. However, none of them fully exploits the benefit of clustering. In this paper, we introduce a new clustering framework. It features a sophisticated evaluation of the trade-off between the approximation error due to clustering and the Laplace error due to Laplace noise injected, which is normally overlooked in prior work. In particular, we propose three clustering strategies with different orders of run-time complexitics. We prove the superiority of our approach by theoretical utility comparisons with the competitors. Our extensive experiments over various standard real-life and synthetic datasets confirm that our technique consistently outperforms existing competitors.
UR - http://www.scopus.com/inward/record.url?scp=84926315326&partnerID=8YFLogxK
U2 - 10.1137/1.9781611973440.68
DO - 10.1137/1.9781611973440.68
M3 - Conference contribution
AN - SCOPUS:84926315326
T3 - SIAM International Conference on Data Mining 2014, SDM 2014
SP - 587
EP - 595
BT - SIAM International Conference on Data Mining 2014, SDM 2014
A2 - Ning-Tan, Pang
A2 - Banerjee, Arindam
A2 - Parthasarathy, Srinivasan
A2 - Obradovic, Zoran
A2 - Kamath, Chandrika
A2 - Zaki, Mohammed
PB - Society for Industrial and Applied Mathematics Publications
T2 - 14th SIAM International Conference on Data Mining, SDM 2014
Y2 - 24 April 2014 through 26 April 2014
ER -