Software
From DDMWiki
[edit]
DDM Related Software
- Distributed Data Mining Toolkit (DDMT) is designed to allow easy development and execution of event-driven distributed algorithms on a single PC or small network of PCs. You can check out the Alpha Version here.
- Distributed Data Mining Simulator is a tool that simulates a network for performing distributed data mining experiments in an simple and flexible way.
- BRITE is a tool for generation of realistic internet topologies.
- peerCounter is a project that aims to address the problem of estimating the size of large and dynamic peer-to-peer networks. It consists of an API and an application, both implemented in Java.
- Grid Weka is a modification to the well-known toolkit for machine learning and data mining called Weka. This modification enables Weka to utilise resources of several computers when performing a number of functions.
- JNGI is a framework that users can use to submit jobs. These jobs are split and distributed among several peers. The use of JXTA peer groups helps us to localize communication, which in turn improves scaling. Also, by providing redundancy within peer groups, we ensure that failures do not affect job completion.
- Jxta-grid is an experiemental project to find commonality between peer-to-peer and parallel computing fields.
- Ns is a discrete event simulator targeted at networking research. Ns provides substantial support for simulation of TCP, routing, and multicast protocols over wired and wireless (local and satellite) networks.
- Scalable Simulation Framework is a public-domain standard for discrete-event simulation of large, complex systems in Java and C++.
- OMNeT++ is a public-source, component-based, modular and open-architecture simulation environment with strong GUI support and an embeddable simulation kernel. Its primary application area is the simulation of communication networks and because of its generic and flexible architecture, it has been successfully used in other areas like the simulation of IT systems, queueing networks, hardware architectures and business processes as well.
- Fourier Representation of Decision Trees is a software that computes the Fourier Spectrum of a decision tree and vice-versa.
- Weka4WS is a framework developed at the University of Calabria that extends the Weka toolkit for supporting distributed data mining on the Grid. Weka4WS has been implemented using the Web Services Resource Framework (WSRF) to achieve interoperability with standard Grid environments such as Globus Toolkit 4.
[edit]
General Data Mining Software
- Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
- SVM Light is an implementation of Vapnik's Support Vector Machine for the problem of pattern recognition, for the problem of regression, and for the problem of learning a ranking function.The algorithm has scalable memory requirements and can handle problems with many thousands of support vectors efficiently.
- YALE is an environment for machine learning experiments and data mining. Experiments can be made up of a large number of arbitrarily nestable operators and their setup is described by XML files which can easily be created with a graphical user interface.
- R is statistical environment and programming language that fit well for machine learning and data mining.
- Orange is an open source Python toolkit for data mining and machine learning. It includes a range of preprocessing, modelling and data exploration techniques. It is based on C++ components, that are accessed either directly (not very common), through Python scripts (easier and better), or through GUI objects called Orange Widgets.
- IlliMine is a partially open-source data mining project written in C++.
- NU-MineBench is a data mining benchmark suite containing a mix of several representative data mining applications from different application domains. This benchmark is intended for use in computer architecture research, systems research, performance evaluation, and high-performance computing. Currently, the benchmark has applications with algorithms based on clustering, association rules, classification, bayesian network, pattern recognition, support vector machines and several other well known data mining methodologies. These applications are used in diverse fields like bioinformatics, network intrusion, customer relationship management, and marketing.