Differentially private high-dimensional data publication via sampling-based inference

Rui Chen, Qian Xiao, Yu ZHANG, Jianliang XU

Research output: Chapter in book/report/conference proceedingConference contributionpeer-review

73 Citations (Scopus)

Abstract

Releasing high-dimensional data enables a wide spectrum of data mining tasks. Yet, individual privacy has been a major obstacle to data sharing. In this paper, we consider the problem of releasing high-dimensional data with differential privacy guarantees. We propose a novel solution to preserve the joint distribution of a high-dimensional dataset. We first develop a robust sampling-based framework to systematically explore the dependencies among all attributes and subsequently build a dependency graph. This framework is coupled with a generic threshold mechanism to significantly improve accuracy. We then identify a set of marginal tables from the dependency graph to approximate the joint distribution based on the solid inference foundation of the junction tree algorithm while minimizing the resultant error. We prove that selecting the optimal marginals with the goal of minimizing error is NP-hard and, thus, design an approximation algorithm using an integer programming relaxation and the constrained concave-convex procedure. Extensive experiments on real datasets demonstrate that our solution substantially outperforms the state-of-the-art competitors.

Original languageEnglish
Title of host publicationKDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages129-138
Number of pages10
ISBN (Electronic)9781450336642
DOIs
Publication statusPublished - 10 Aug 2015
Event21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015 - Sydney, Australia
Duration: 10 Aug 201513 Aug 2015

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume2015-August

Conference

Conference21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015
Country/TerritoryAustralia
CitySydney
Period10/08/1513/08/15

Scopus Subject Areas

  • Software
  • Information Systems

User-Defined Keywords

  • Dependency graph
  • Differential privacy
  • High-dimensional data
  • Joint distribution
  • Junction tree algorithm

Fingerprint

Dive into the research topics of 'Differentially private high-dimensional data publication via sampling-based inference'. Together they form a unique fingerprint.

Cite this