Information Visualization Design for Multidimensional Data: Integrating the Rank-by-Feature Framework with Hierarchical Clustering - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Information Visualization Design for Multidimensional Data: Integrating the Rank-by-Feature Framework with Hierarchical Clustering

Description:

Tools and techniques are available in many areas. Strategy and interface to organize them to guide ... 25 just curious users. Which features have you used? ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 32
Provided by: jinw
Learn more at: https://www.cs.umd.edu
Category:

less

Transcript and Presenter's Notes

Title: Information Visualization Design for Multidimensional Data: Integrating the Rank-by-Feature Framework with Hierarchical Clustering


1
Information Visualization Design for
Multidimensional DataIntegrating the
Rank-by-Feature Framework with Hierarchical
Clustering
  • Dissertation Defense
  • Human-Computer Interaction Lab
  • Dept. of Computer Science
  • Jinwook Seo

2
Outline
  • Research Problems
  • Clustering Result Visualization in HCE
  • GRID Principles
  • Rank-by-Feature Framework
  • Evaluation
  • Case studies
  • User survey via emails
  • Contributions and Future work

3
Exploration of Multidimensional Data
  • To understand the story that the data tells
  • To find features in the data set
  • To generate hypotheses
  • Lost in multidimensional space
  • Tools and techniques are available in many areas
  • Strategy and interface to organize them to guide
    discovery

4
Constrained by Conventions
User/Researcher
Conventional Tools
Statistical Methods
Data Mining Algorithms
Multidimensional Data
5
Boosting Information Bandwidth
User/Researcher
Information Visualization Interfaces
Statistical Methods
Data Mining Algorithms
Multidimensional Data
6
Contributions
  • Graphics, Ranking, and Interaction for Discovery
    (GRID) principles
  • Rank-by-Feature Framework
  • The design and implementation of the Hierarchical
    Clustering Explorer (HCE)
  • Validation through case studies and user surveys

7
Hierarchical Clustering ExplorerUnderstanding
Clusters Through Interactive Exploration
  • Overview of the entire clustering results ?
    compressed overview
  • The right number of clusters ? minimum
    similarity bar
  • Overall pattern of each cluster (aggregation)
    ? detail cutoff bar
  • Compare two results ? brushing and linking
    using pair-tree

8
HCE History
  • Document-View Architecture
  • 72,274 lines of C codes, 76 C classes
  • About 2,500 downloads since April 2002
  • Commercial license to a biotech company
    (www.vialactia.com)
  • Freely downloadable at www.cs.umd.edu/hcil/hce

9
Goal Find Interesting Features in
Multidimensional Data
  • Finding clusters, outliers, correlations, gaps,
    is difficult in multidimensional data
  • Cognitive difficulties in gt3D
  • Therefore utilize low-dimensional projections
  • Perceptual efficiency in 1D and 2D
  • Orderly process to guide discovery

10
Do you see anything interesting?
11
Do you see any interesting feature?
12
CorrelationWhat else?
13
Outliers
He
Rn
14
GRID Principles
  • Graphics, Ranking, and Interaction for Discovery
    in Multidimensional Data
  • study 1D
  • study 2D
  • then find features
  • ranking guides insight
  • statistics confirm

15
(No Transcript)
16
Rank-by-Feature Framework
  • Based on the GRID principles
  • 1D ? 2D
  • 1D Histogram Boxplot
  • 2D Scatterplot
  • Ranking Criteria
  • statistical methods
  • data mining algorithms
  • Graphical Overview
  • Rapid Interactive Browsing

17
Demo
A Ranking Example
3138 U.S. counties with 17 attributes


Uniformness (entropy) (6.7, 6.1, 4.5, 1.5)




Pearson correlation (0.996, 0.31, 0.01, -0.69)
18
Categorical Variables in RFF
  • New ranking criteria
  • Chi-square, ANOVA
  • Significance and Strength
  • How strong is a relationship?
  • How significant is a relationship?
  • Partitioning and Comparison
  • partition by a column (categorical variable)
  • partition by a row (class info for columns)
  • compare clustering results for partitions

19
color Contingency coefficient C size
Chi-square p-value
color Quadracity size Least-square
error
20
Categorical Variables in RFF
  • New ranking criteria
  • Chi-square, ANOVA
  • Significance and Strength
  • How strong is a relationship?
  • How significant is a relationship?
  • Partitioning and Comparison
  • partition by a column (categorical variable)
  • partition by a row (class info for columns)
  • compare clustering results for partitions

21
Partitioning and Comparison
s1 s2 s3 s4 s5 s6 s7
FieldType integer integer real integer integer integer categorical
i1 M
i2 M
i3 M

in-1 F
in F
  • Compare two column-clustering results

22
Partitioning and Comparison
s1 s2 s3 s4 s5 s6
CID 1 1 1 2 2 2
FieldType integer integer real integer integer integer
i1
i2
i3

in-1
in
  • Compare two row-clustering results

23
Qualitative Evaluation
  • Case studies
  • 30-minute weekly meeting for 6 weeks individually
  • observe how participants use HCE
  • improve HCE according to their requirements
  • 1 molecular biologist (Acute lung injuries in
    mice)
  • 1 biostatistician (FAMuSS Study data)
  • 1 meteorologist (Aerosol measurement)

24
Lessons Learned
  • Rank-by-Feature Framework
  • Enables systematic/orderly exploration
  • Prevents from missing important features
  • Helps confirm known features
  • Helps identify unknown features
  • Reveals outliers as signal/noise
  • More work needed
  • Transformation of variables
  • More ranking criteria
  • More interactions

25
User Survey via Emails
  • 1500 user survey emails
  • 13 questions on HCE and RFF
  • 60 successfully sent out
  • 85 users replied
  • 60 users answered a majority of questions
  • 25 just curious users

26
Which features have you used?
Do you think HCE improved the way you analyze
your data set?
27
Future Work
  • Integrating RFF with Other Tools
  • More ranking criteria
  • GRID principles available in other tools
  • Scaling-up
  • Selection/Filtering to handle large number of
    dimensions
  • Interaction in RFF
  • Further Evaluation

28
(No Transcript)
29
Future Work
  • Integrating RFF with Other Tools
  • More ranking criteria
  • GRID principles available in other tools
  • Scaling-up
  • Selection/Filtering to handle large number of
    dimensions
  • Interaction in RFF
  • Further Evaluation

30
Contributions
  • Graphics, Ranking, and Interaction for Discovery
    (GRID) principles
  • Rank-by-Feature Framework
  • The design and implementation of the Hierarchical
    Clustering Explorer (HCE)
  • Validation through case studies and user surveys

31
Thank you !
Write a Comment
User Comments (0)
About PowerShow.com