Statistical Society of Canada - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Statistical Society of Canada

Description:

Structuring Interactive Cluster Analysis Wayne Oldford University of Waterloo – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 41
Provided by: WayneO6
Category:

less

Transcript and Presenter's Notes

Title: Statistical Society of Canada


1
Structuring Interactive Cluster Analysis
  • Wayne Oldford
  • University of Waterloo

2
Overview
Content by example
Argument
  • ill-defined problem
  • high-interaction desirable
  • explore partitions
  • recast algorithms
  • problems
  • resources
  • interactive clustering
  • partition moves
  • implications

3
Problem
geometric/visual structure

4
Problem
context matters

5
Problem
structure in context
segmentation in MRI
image source
6
Problem
context specific structure
image source
7
Problem
some specific some not
image source
8
Problem
some specific some not
image source
9
Problem
  • Find groups in data
  • Similar objects are together
  • Groups are separated
  • Problem is ill defined
  • What do you mean similar?
  • E.g. what is contiguous structure?
  • When are groups separate?
  • Can we believe it?

10
Computational resources
  • 1. Processing

2. Memory
3. Display
11
Computational resources
  • 1. Processing
  • computationally intensive
  • problem constrained
  • optimality sought

2. Memory
3. Display
12
Computational resources
  • 1. Processing

2. Memory
  • GBs, TBs, disk
  • GBs ram to processor
  • more data

3. Display
13
Computational resources
  • 1. Processing

2. Memory
3. Display
  • high resolution, large
  • graphics processors, digital video
  • more data, more visual detail

14
Computational resources
  • 1. Processing

2. Memory
3. Display
Balance and integrate
15
High interaction
  • multiple displays
  • integrate computational resources
  • software design?

16
Example image analysis
17
Example context and function plots
18
Example mutual support and shapes
19
Example exploratory data analysis
20
Interactive clustering
  • visual grouping
  • location, motion, shape, texture, ...
  • linking across displays
  • manual
  • selection
  • cases, variates, groups, ...
  • colouring
  • focus
  • immediate and incremental
  • context can be used to form groups
  • multiple partitions

21
Automated clustering typical software
  • resources dedicated to numerical computation
  • teletype interaction
  • runs to completion
  • graphical output
  • dont always work so well (no universal solution)
  • confirm via exploratory data analysis

Must be integrated with interactive methods
22
Example K-means clustering
23
Example VERI Visual Empirical Regions of
Influence

join points if no third point falls in this
region
24
Example VERI
25
Integrating automatic methods
  • Move about the space of partitions
  • Pa --gt Pb --gt Pc --gt .

Which operators f f(Pa) --gt Pb
are of interest?
26
Refine
Reduce
27
Reassign
28
Refinement sequence
  • 1

-gt 2
-gt 3
-gt 4
-gt 5
29
Reassign, reduce sequence
  • 5

-gt 5
30
Reassign, reduce sequence
  • 5

-gt 5
-gt 4
-gt 3
-gt 2
31
Moves
examples
  • refine (Pold) --gt Pnew

break minimal spanning tree
  • reduce (Pold) --gt Pnew

join near centres
  • reassign (Pold) --gt Pnew

k-means maximize F
  • partition (graphic) --gt Pnew

colours from point cloud
32
Challenges
  • varying focus
  • subsets (selected manually and at random)
  • merging new data into partition
  • exploring multiple partitions
  • interactive display and comparison
  • resolving many to one
  • interface design
  • control panels, options
  • interaction

33
Interface
34
Interface - reduce
35
Interface - refine
36
Interface - reassign
37
Interaction
38
Interaction - refine 2
39
Interaction - refine 3
40
Interaction -save partition movie
41
Interaction -refine 4
42
Interaction - refine 5
43
Interaction - refine 5 dendrogram
44
Interaction - reassign
45
Interaction - cluster plot movie
46
Creation
  • partition (Data ...) --gt Pnew
  • manually from colours
  • k-means, random start, mst, veri, etc
  • from existing classifier.
  • partition-path (Data ) --gt P1 , P2 , , Pn
  • partition-path (Pold ...)
  • --gt Pold , P1 , P2 ,
    , Pn
  • e.g. nested sequence from hierarchical clustering

47
Composition
  • resolve (P1, ..., Pm ) --gt Pnew
  • combine different partitions of the same data
  • merge (Data, Pold ) --gt Pnew
  • classify additional points
  • merge (Pa , Pb ) --gt Pnew
  • combine non-overlapping partitions

48
Other operators
  • dissimilarity (Pi, Pj) --gt di,j
  • display (P1, ..., Pm)
  • dendrogram if P1 lt lt Pm
  • mds plot of all clusters in P1, , Pm
  • mds plot of all partitions P1, , Pm

49
Implications
  • Algorithms (re)cast in terms of moves
  • refine, reduce
  • reassign
  • partition, partition-path
  • easily understandable (e.g. geometric structures)
  • specify required data structures
  • e.g. ms tree, triangulation, var-cov matrix,

50
New problems
  • interface design
  • multiple partitions
  • comparison and/or resolution
  • multiple display
  • inference

51
Summary
  • Cluster analysis is naturally exploratory and
    needs integration with modern interactive data
    analysis.
  • Enlarging the problem to partitions
  • simplifies and gives structure
  • encourages exploratory approach
  • integrates naturally
  • introduces new possibilities (analysis and
    research)

52
Acknowledgements
  • Catherine Hurley, Erin McLeish, Rayan Yahfoufi,
    Natasha Wiebe
  • U(W) students in statistical computing
  • Quail Quantitative Analysis in Lisp
  • http//www.stats.uwaterloo.ca/Quail
Write a Comment
User Comments (0)
About PowerShow.com