Project Details
Description
The recent rapid rise of information networks in various application domains has generated an enormous amount of network data, which is typically represented as graphs with nodes corresponding to a set of individuals and edges corresponding to connections between them. While network data has led to a wide spectrum of data analysis tasks, it makes individual privacy subject to more intrusions than ever before. For example, the privacy breach of the social networking site LivingSocial this year affected more than 50 million users.1 As a precaution, the Hong Kong Office of the Privacy Commissioner for Personal Data has recently organized a series of studies of personal privacy on social networking sites. Unfortunately, past experience has shown that policies and guidelines alone are unable to provide either adequate privacy protection or meaningful utility for data analysis. Technical efforts are indispensable to make published network data both private and useful.
Early research was mainly driven by specific types of privacy attacks on network data. The privacy guarantees derived from early research rely heavily on assumptions about the background knowledge of adversaries and have been shown to be vulnerable to more powerful adversaries. It is therefore imperative to develop new techniques with provable privacy guarantees. The recent emergence of differential privacy opens up a new possibility of providing such guarantees that are independent of adversaries’ background knowledge. Differential privacy requires that the outcome of any analysis over a database should not overly depend on any single user and thus assures that any privacy breach will not be a result of participating in the database. There has been much research on differentially private mechanisms for relational (tabular) data. However, only little work has been done on applying differential privacy to non-interactive network data publication, the goal of which is to release a sanitized network dataset. One reason is that network data poses new challenges due not only to the inherent complexity of processing structural information in network data, but also to the fact that network metrics are very sensitive to relatively small changes in the network structure. This poses major challenges to both utility and scalability.
In this project, we propose to investigate novel practical techniques for non-interactive network data publication under differential privacy. Our key observation is that this challenging problem could be converted into a density-based exploration and reconstruction process over the adjacency matrix of an input network dataset. This conversion is intrinsically suitable for addressing the challenges of sanitizing network data. More specifically, our proposed project includes the following research problems under differential privacy: (1) identify a vertex labeling scheme that forms dense regions in its corresponding adjacency matrix; (2) design an adaptive top-down partitioning process to explore dense regions in the adjacency matrix; (3) develop a biclustering framework that allows for extraction of dense regions from nonadjacent rows and columns; (4) design efficient mechanisms for reconstructing the identified dense regions while minimizing utility loss. Finally, a general sanitization framework will be developed. Its effectiveness will be verified for various data analysis tasks over different real-life network datasets. Based on our extensive research experience, we expect that, upon completion, our framework will be the first that is capable of sanitizing large-scale, real-life network data under differential privacy and that it will be able to facilitate the dissemination of network data by easing different stakeholders’ anxieties about privacy risks.
Early research was mainly driven by specific types of privacy attacks on network data. The privacy guarantees derived from early research rely heavily on assumptions about the background knowledge of adversaries and have been shown to be vulnerable to more powerful adversaries. It is therefore imperative to develop new techniques with provable privacy guarantees. The recent emergence of differential privacy opens up a new possibility of providing such guarantees that are independent of adversaries’ background knowledge. Differential privacy requires that the outcome of any analysis over a database should not overly depend on any single user and thus assures that any privacy breach will not be a result of participating in the database. There has been much research on differentially private mechanisms for relational (tabular) data. However, only little work has been done on applying differential privacy to non-interactive network data publication, the goal of which is to release a sanitized network dataset. One reason is that network data poses new challenges due not only to the inherent complexity of processing structural information in network data, but also to the fact that network metrics are very sensitive to relatively small changes in the network structure. This poses major challenges to both utility and scalability.
In this project, we propose to investigate novel practical techniques for non-interactive network data publication under differential privacy. Our key observation is that this challenging problem could be converted into a density-based exploration and reconstruction process over the adjacency matrix of an input network dataset. This conversion is intrinsically suitable for addressing the challenges of sanitizing network data. More specifically, our proposed project includes the following research problems under differential privacy: (1) identify a vertex labeling scheme that forms dense regions in its corresponding adjacency matrix; (2) design an adaptive top-down partitioning process to explore dense regions in the adjacency matrix; (3) develop a biclustering framework that allows for extraction of dense regions from nonadjacent rows and columns; (4) design efficient mechanisms for reconstructing the identified dense regions while minimizing utility loss. Finally, a general sanitization framework will be developed. Its effectiveness will be verified for various data analysis tasks over different real-life network datasets. Based on our extensive research experience, we expect that, upon completion, our framework will be the first that is capable of sanitizing large-scale, real-life network data under differential privacy and that it will be able to facilitate the dissemination of network data by easing different stakeholders’ anxieties about privacy risks.
Status | Finished |
---|---|
Effective start/end date | 1/07/14 → 30/06/17 |
UN Sustainable Development Goals
In 2015, UN member states agreed to 17 global Sustainable Development Goals (SDGs) to end poverty, protect the planet and ensure prosperity for all. This project contributes towards the following SDG(s):
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.