1/11/13

The Phases of the KDD Process


The Phases of the KDD Process

A KDD process consists of several tasks. Indeed, the actual mining, that is to say the application of a data mining algorithm to a dataset, is only one of these steps. Following the CRISP-Data Mining model [9,31] we distinguish the following tasks:

1. Business Understanding
The very first step of a KDD project should be a close look from the business point of view. The goal of this phase is to gain a deeper understanding of the project objectives and further circumstances strictly from the business perspective. Finally the insights from this initial phase are to be turned into a data mining problem definition.

2. Data Understanding
Based on the results from the business point of view the second step is to get familiar with the available data. The goal is to understand the attributes and the corresponding attribute values and to find out hidden semantics possibly in the data. Furthermore at this stage one should figure out what exactly the available data offers. That is to say, whether it has the potential to answer our mining questions or not, and if possible to select promising subsets of the data.

3. Data Preparation
The next step is to construct the dataset where the mining algorithm is to be run on. This phase covers both syntactic aspects – format transformations for the employed mining algorithm – and semantic aspects like table, record and attribute selection. Last but not least this phase also includes deriving new attributes that contain higher information only implicitly contained in the raw data (e.g. deriving “day of the week” from “date”).

4. Modeling (or Mining)
In the modeling phase the actual data mining takes place. Based on the identified business goals and the assessment of the available data an appropriatemining algorithm is chosen and run on the prepared data.

5. Evaluation
Evaluating the results of the mining run mainly covers three aspects. First of all, it is necessary to ensure whether everything went right from the technical point of view. Was the mining algorithms finally able to read and interpret the prepared dataset correctly? Were all designated information actually given to the algorithm? Etc. Second, one needs to investigate whether the mining results are sound from the mining methods point of view. Some methods directly support this decision by computing certain significance measures whereas others leave this aspect completely to the analyst and his experience. Third, a key objective of the evaluation phase is to determine if all important business issues have been considered adequately.

6. Deployment
After mining the data and assessing the data mining results one needs to transfer the results back into the business environment. This can be rather straight forward like preparing the results in form of a report that is understandable by business people (who of course typically are non data mining experts). Or, as the other extreme, can be quite complex like implementing a repeatable data mining process across the enterprise.

1/10/13

UML Tools for Ubuntu

UML Tools for Ubuntu

Unified Modelling Language (UML) is basically a modelling languages that is typically used in the design phase of object oriented software development process, to express the thought in graphical representation, to visualize the system etc. UML, created by OMG (Object Management Group), has now become the standard approach to software modelling. In fact the UML diagrams makes coding part easier in complex systems, so the time you would spent in drawing will save you a lot of time (later, during development/implementation phase). UML diagrams represents – Static as well as Dynamic views of a system, so using UML tools you can draw – class diagrams, activity diagrams, sequence diagrams, use case diagrams etc.
There are various tools available such as IBM Rational Rose, for drawing UML diagrams but in this post you are going to know about some open source modelling tools, that you can use on your Linux based Operating systems e.g Ubuntu 11.04 (Natty Narwhal) /11.10 (Oneiric Ocelot, the code name of upcoming version)/ or older versions such as 10.10/10.04 LTS…, Linux Mint, Fedora, OpenSuse etc.

UML Modelling Tools for Ubuntu (11.04/11.10/10.04/10.10..) or other Linux OS

Here are some of the most popular free and open source UML Modelling softwares that you can use with Ubuntu or other Linux distro, some of them are cross-platform i.e also available for Windows and Mac OS. Try some of them and choose the one which suits you best.

Umbrello UML Modeller

Umbrello is a KDE based software for creating UML diagrams in standard format. It supports a lot of languages and code generation (advanced code generation is available for ruby) is also supported, it has all basic features – that you expect from a UML tool.
umbrello-uml-modelling-tool
Download Umbrello UML Tool or better install it from command line (in Ubuntu or Debian based system) -
sudo apt-get install umbrello

Gaphor

Gaphor is a cross-platform UML tool with simplicity, before installing it be sure to check for its dependencies such as Python (version 2.5 or later).
Download Gaphor
gaphor-uml-tool-for-ubuntu

ArgoUML

ArgoUML is one of the best open source – ‘software modelling tool’ with a lot of advanced features. It is a java based (and of course cross-platform) application.
argouml-snapshot
The supported diagrams are -
  • class diagrams
  • statechart diagrams
  • use case diagrams
  • activity diagrams
  • sequence diagram
  • deployment diagram
Download ArgoUML (Latest version 0.32.2 was released on April, 2011)

Dia

Dia is a GTK based drawing program that can be used for drawing UML diagrams, Flowcharts, Network diagrams etc.
Download Dia

1/8/13

How to Turn Your Ubuntu Laptop into a Wireless Access Point

How to Turn Your Ubuntu Laptop into a Wireless Access Point

image
If you have a single wired Internet connection – say, in a hotel room – you can create an ad-hoc wireless network with Ubuntu and share the Internet connection among multiple devices. Ubuntu includes an easy, graphical setup tool.
Unfortunately, there are some limitations. Some devices may not support ad-hoc wireless networks and Ubuntu can only create wireless hotspots with weak WEP encryption, not strong WPA encryption.

Setup

To get started, click the gear icon on the panel and select System Settings.

Select the Network control panel in Ubuntu’s System Settings window. You can also set up a wireless hotspot by clicking the network menu and selecting Edit Network Connections, but that setup process is more complicated.

If you want to share an Internet connection wirelessly, you’ll have to connect to it with a wired connection. You can’t share a Wi-Fi network – when you create a Wi-Fi hotspot, you’ll be disconnected from your current wireless network.

To create a hotspot, select the Wireless network option and click the Use as Hotspot button at the bottom of the window.

You’ll be disconnected from your existing network. You can disable the hotspot later by clicking the Stop Hotspot button in this window or by selecting another wireless network from the network menu on Ubuntu’s panel.

After you click Create Hotspot, you’ll see an notification pop up that indicates your laptop’s wireless radio is now being used as an ad-hoc access point. You should be able to connect from other devices using the default network name – “ubuntu” – and the security key displayed in the Network window. However, you can also click the Options button to customize your wireless hotspot.

From the wireless tab, you can set a custom name for your wireless network using the SSID field. You can also modify other wireless settings from here. The Connect Automatically check box should allow you to use the hotspot as your default wireless network – when you start your computer, Ubuntu will create the hotspot instead of connecting to an existing wireless network.

From the Wireless Security tab, you can change your security key and method. Unfortunately, WPA encryption does not appear to be an option here, so you’ll have to stick with the weaker WEP encryption.

The “Shared to other computers” option on the IPv4 Settings tab tells Ubuntu to share your Internet connection with other computers connected to the hotspot.

Even if you don’t have a wireless Internet connection available to share, you can network computers together and communicate between them – for example, to share files.

1/7/13

Install Java JDK 7 in Ubuntu 12.04

Install Java JDK 7 in Ubuntu 12.04


This package provides Oracle Java JDK 7 (which includes JRE and the Java browser plugin). However, you can’t only install Oracle JRE – the PPA only provides the full Oracle JDK7 package.
1. To add our PPA and install the latest Oracle Java (JDK) 7 in Ubuntu, use the commands below:
#sudo add-apt-repository ppa:webupd8team/java

#sudo apt-get update && sudo apt-get install oracle-jdk7-installer
2. After install you have enable the jdk:
#update-alternatives –display java

3. Check if Ubuntu uses Java JDK 7
#java -version

If all went right the answer should be:
java version “1.7.0_07″

Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
Java HotSpot(TM) Server VM (build 23.3-b01, mixed mode)

4. Check what compiler is used:
#javac -version

The correct answer:
javac 1.7.0_07

5. Adding JAVA_HOME to environment
Edit
/etc/environment and add JAVA_HOME=/usr/lib/jvm/java-7-oracle to the end of the file.
First open a Terminal (Applications → Accessories → Terminal), then enter:
#sudo gedit /etc/environment
Append to the end of the file:
JAVA_HOME=/usr/lib/jvm/java-7-oracle
6. Removing Oracle JDK 7
If you don’t want to use Oracle Java (JDK) 7 anymore and want to go back to OpenJDK, all you have to do is remove the Oracle JDK7 Installer and the previous Java (OpenJDK, etc.) version will be used:
#sudo apt-get remove oracle-jdk7-installer