Federated Data Analytics using Hardware Enclaves

Project: Research project

Project Details


Following the success of federated learning, there is a growing interest in using federated technologies to enable collaborative analytics on decentralized sensitive data without invading privacy. For example, Google has used federated analytics to power the features of GBoard’s word suggestions and Pixel phones’ Now Playing,1 and Intel uses federated analytics to help banks fight cross-border money laundering. 2 However, existing federated data analytics systems are either confined to simple aggregate queries or incur extremely high computation and communication overheads.

This project aims to develop efficient federated techniques for advanced data analytics using hardware enclaves (e.g., Intel SGX). In our system, multiple mutually distrustful data providers securely share their sensitive data with an honest-but-curious data broker to answer analytics queries from users. Both the data providers and the data broker leverage hardware enclaves to access data and process queries. Although all data in the system are fully encrypted and only visible inside enclaves, any party can monitor the access patterns of the system memory, local storage, and network communication, making the system susceptible to side-channel attacks. To prevent side-channel information leakage, we propose a novel analytics framework that supports oblivious and efficient data access in query processing. In comparison with prior solutions, our system has three advantages. First, instead of designing a dedicated algorithm for each specific analytics query, our system works by combining existing data analytics algorithms with data-oblivious primitives, making the system easier to adopt. Second, our system can achieve a higher degree of efficiency through using indexes for query processing. Third, our system supports an adjustable tradeoff between performance and privacy by using differential privacy to perturb data access patterns.

To develop our system, we propose new data-oblivious techniques for various federated analytics queries, including joins, SQL queries, and graph queries. Firstly, to boost the overall system performance, we will design data-oblivious yet highly efficient data structures to support common basic operations, such as stack, queue, and priority-queue. Secondly, on the basis of primitive obvious data structures, we will develop index-based data access and query processing algorithms to collaboratively answer analytics queries. Thirdly, although we can achieve obliviousness for data access by storing the data inside an oblivious RAM, the size of the results for each access would leak sensitive information. Thus, efficient padding schemes will be developed to mask such information. Finally, we plan to design novel differentially private query techniques to further optimize system performance with bounded privacy loss.

With our rich research experience in secure and privacy-aware computing, we expect the outcome of this project to accelerate the growth and adoption of federated analytics technologies and decentralized services in the pertinent industries.

Effective start/end date1/01/2231/12/24


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.