Data mining exercise Clustering Lab 3 - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Data mining exercise Clustering Lab 3

Description:

Goal: Add a new attribute 'Weekday' and 'Hour' 18. Discretization ... Derive new attributes. Merge tables. Perform discretization (binning) Clustering modeling with ... – PowerPoint PPT presentation

Number of Views:166
Avg rating:3.0/5.0
Slides: 29
Provided by: LAP7
Category:

less

Transcript and Presenter's Notes

Title: Data mining exercise Clustering Lab 3


1
Data mining exerciseClusteringLab 3
  • Winnie Lam
  • Email cswinnie_at_comp.polyu.edu.hk
  • Website http//www.comp.polyu.edu.hk/cswinnie/
  • The Hong Kong Polytechnic University
  • Department of Computing

2
REVIEW
  • Classification modeling
  • WEKA (ID3)
  • Clementine (C5.0)
  • Questions?

3
OVERVIEW
  • KDD Process

Evaluation
Data Mining
Transformation
Preprocessing
Knowledge
Selection
Patterns
Transformed Data
Preprocessed Data
Target Data
4
Simplified process
Define target discover useful data
Data Understanding
Obtain Clean Useful data
Data Preparation
Discover patterns
Data Mining
Apply the knowledge
Evaluation
5
Download files
  • WEKA
  • http//prdownloads.sourceforge.net/weka/weka-3-4-8
    a.exe
  • Data file
  • MyData_lab3.mdb
  • http//www.comp.polyu.edu.hk/cswinnie/data/MyData
    _lab3.mdb
  • lab3.csv
  • http//www.comp.polyu.edu.hk/cswinnie/data/lab3.c
    sv

6
Classification
With predefined class!
7
Clustering
No class is defined previously!
STAR
CROSS
TRIANGLE
8
Clementine
  • Clustering with

9
Modeling Tools Clustering
  • K-means. An approach to clustering that defines k
    clusters and iteratively assigns records to
    clusters based on distances from the mean of each
    cluster until a stable solution is found.
  • TwoSteps. A clustering method that involves
    preclustering the records into a large number of
    subclusters and then applying a hierarchical
    clustering technique to those subclusters to
    define the final clusters.
  • Kohonen Networks. A type of neural network used
    for clustering. Also known as a self organizing
    map (SOM).

10
Data Understanding
Data file MyData_lab3.mdb
Step 1 Create Data Source (ODBC) in Control Panel
5
1
7
6
2
3
4
11
Data Understanding
Step 2 Import Data to Clementine
  • Add Source node (Database) to Clementine
  • Choose Data Source
  • Select Tables (lab3, Link, Shop_Info) lt- one at
    each time

1
2
3
4
12
(No Transcript)
13
Data Preparation

14
Goal Merge table lab3 and Shop_Info
Add Node Merge (in Record Ops Palette)
link TID SHOP_CD
lab3 TID dt gp1 gp2 ref_no cl prod_cd
Shop_Info dist_cd shop_cd staffs manager Area
Answer by yourself What is/are the key(s) for
merging?
15
  • Step 1
  • Merge table lab3 and link

Add Node Merge (in Record Ops Palette)
Field from .Link
16
Step 2. Merge result in step 1 to table Shop_Info
Add Node Merge (in Record Ops Palette)
Fields from .Shop_Info
17
Goal Add a new attribute Weekday and Hour
Useful Node Derive (in Field Ops Palette)
Newly derived
Result
Weekday datetime_day_name(datetime_weekday(dt))
Hour datetime_hour(datetime_time(dt))
18
Discretization
Goal Divide the Hour field into 3 intervals
(Fixed-width)
2
3
1
  • Steps
  • Add Binning node and specify no. of bins
  • Add Type node to update the information of
    newly added information
  • Add Re-classify to rename the bins to Morning,
    Afternoon, Evening

19
Data Transformation
Goal Divide the Staff field into 5 intervals
(Fixed-width)
Result
20
Data Mining - Clustering
Goal Divide the customers into 3 clusters
Add Node K-means (in Modeling Palette)
4
1
5
2
3
21
Data Mining - Clustering
Find out unsuitable attributes that cannot
represent the cluster
Result
Note You may adjust the value of k (no. of
clusters)
22
WEKA
  • Clustering with

23
Import data lab3.csv
Goal remove useless attributes
1
2
24
Remaining attributes
25
Clustering - SimpleKMeans
Choose Classifier (wekagtclusterersgtSimpleKMeans)
1
2
numClusters -- set number of clusters

seed -- random number seed
26
Result
27
Comparison
Clementine
Weka
28
SUMMARY
  • Today, youve learnt
  • Derive new attributes
  • Merge tables
  • Perform discretization (binning)
  • Clustering modeling with
  • Clementine and WEKA
Write a Comment
User Comments (0)
About PowerShow.com