Distributed and Ubiquitous Data Mining
From DDMWiki
Introduction
Advances in computing and communication over wired and wireless networks have resulted in many pervasive distributed computing environments. The Internet, intranets, local area networks, mobile ad hoc wireless networks, peer-to-peer networks, and sensor networks are some examples. These environments often come with different distributed sources of data and computation. Mining in such environments naturally calls for proper utilization of these distributed resources. Moreover, in many privacy sensitive applications different, possibly multi-party, data sets collected at different sites must be processed in a distributed fashion without collecting everything to a single central site. However, most off-the-shelf data mining systems are designed to work as a monolithic centralized application. They normally down-load the relevant data to a centralized location and then perform the data mining operations. This centralized approach does not work well in many of the emerging distributed, ubiquitous, possibly privacy-sensitive data mining applications.
Distributed and Ubiquitous Data Mining (DDM) offers an alternate approach to address this problem of mining data using distributed resources. DDM pays careful attention to the distributed resources of data, computing, communication, and human factors in order to use them in a near optimal fashion.
Distributed and Ubiquitous Data Mining (DDM) applications come in different flavors. When the data can be freely and efficiently transported from one node to another without significant overhead, DDM algorithms may offer better scalability and response time by (1) properly redistributing the data in different partitions or (2) distributing the computation, or (3) a combination of both. These algorithms often rely on fast communication between participating nodes. However, when the data sources are distributed and cannot be transmitted freely over the network due to privacy-constraints or bandwidth limitation or scalability problems, DDM algorithms work by avoiding or minimizing communication of the raw data. In short, DDM offers the technology to analyze data by optimally utilizing the distributed computing, storage, and human resources.
One may classify the Distributed and Ubiquitous Data Mining (DDM) literature in various ways. This DDMWiki tries to group the literature from the perspective of application environment using the following broad categories:
- Peer-to-Peer Data Mining
- Privacy Preserving Data Mining
- Distributed Data Stream Mining
- Data Mining in Mobile and Embedded Devices
- Distributed Data Mining in Sensor Networks
- Mining on the Grid
- Parallel Data Mining
An overview of different distributed data mining algorithms can be found elsewhere [1, 2, 3].
References
- C. Giannella, H. Kargupta and H. Dutta. (2004). Algorithms for Distributed Association Rule Mining and Clustering: Past and Future Directions. Handbook of Data Mining. Editor: Sanjay Ranka.
- H. Kargupta and K. Sivakumar, (2004) Existential Pleasures of Distributed Data Mining. Data Mining: Next Generation Challenges and Future Directions. Editors: H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha. AAAI/MIT Press.
- B. Park and H. Kargupta. (2002). Distributed Data Mining: Algorithms, Systems, and Applications. Data Mining Handbook. Editor: Nong Ye.
- Kargupta and Philip Chan. Advances in Distributed and Parallel Knowledge Discovery, xv--xxvi, MIT/AAAI Press, 2000.
- Kargupta. Data Mining in Distributed, Ubiquitous Environment: Past, Present, and Future , European Union meeting on KDUbiq initiative, Germany, January, 2006. Slides