This article examines a new problem of k-anonymity with respect to a reference dataset in privacyaware location data publishing: given a user dataset and a sensitive event dataset, we want to generalize the user dataset such that by joining it with the event dataset through location, each event is covered by at least k users. Existing k-anonymity algorithms generalize every k user locations to the same vague value, regardless of the events. Therefore, they tend to overprotect against the privacy compromise and make the published data less useful. In this article, we propose a new generalization paradigm called local enlargement, as opposed to conventional hierarchy- or partition-based generalization. Local enlargement guarantees that user locations are enlarged just enough to cover all events k times, and thus maximize the usefulness of the published data. We develop an O(Hn)-approximate algorithm under the local enlargement paradigm, where n is the maximum number of events a user could possibly cover and Hn is the Harmonic number of n. With strong pruning techniques and mathematical analysis, we show that it runs efficiently and that the generalized user locations are up to several orders of magnitude smaller than those by the existing algorithms. In addition, it is robust enough to protect against various privacy attacks.
Scopus Subject Areas
- Information Systems