Abstract
Summary: Scalability and data privacy are two main challenges hindering distributed data analysis from being widely applied in many collaborative projects. In this chapter, we first review a recently proposed scalable and privacy-preserving distributed data analysis approach. The approach computes abstractions of distributed data which are then used for mining global data patterns. Then, we describe a service-oriented realization of the approach for data clustering and explain in detail how the analysis process is deployed in a BPEL platform for execution. In addition, lessons learned in the implementation exercise and future research directions regarding how distributed data analysis platforms can be built with even higher scalability and improved support for privacy preservation is also discussed.
Original language | English |
---|---|
Title of host publication | Data Mining Techniques in Grid Computing Environments |
Publisher | John Wiley & Sons Ltd. |
Pages | 105-118 |
Number of pages | 14 |
ISBN (Print) | 9780470512586 |
DOIs | |
Publication status | Published - 22 Jun 2009 |
Scopus Subject Areas
- Computer Science(all)
User-Defined Keywords
- BPEL process creation tools - Oracle BPEL designer or ActiveBPEL
- Business Process Execution Language (BPEL)
- Data analysis challenges
- Extensible Stylesheet Language Transformation (XSLT)
- Gaussian mixture model (GMM)
- Modelling distributed data mining and workflow processes
- Scalability and data privacy
- Scalable and privacy preserving data mining paradigm