Title: Information Visualization Design for Multidimensional Data: Integrating the Rank-by-Feature Framework with Hierarchical Clustering
1Information Visualization Design for
Multidimensional DataIntegrating the
Rank-by-Feature Framework with Hierarchical
Clustering
- Dissertation Defense
- Human-Computer Interaction Lab
- Dept. of Computer Science
- Jinwook Seo
2Outline
- Research Problems
- Clustering Result Visualization in HCE
- GRID Principles
- Rank-by-Feature Framework
- Evaluation
- Case studies
- User survey via emails
- Contributions and Future work
3Exploration of Multidimensional Data
- To understand the story that the data tells
- To find features in the data set
- To generate hypotheses
- Lost in multidimensional space
- Tools and techniques are available in many areas
- Strategy and interface to organize them to guide
discovery
4Constrained by Conventions
User/Researcher
Conventional Tools
Statistical Methods
Data Mining Algorithms
Multidimensional Data
5Boosting Information Bandwidth
User/Researcher
Information Visualization Interfaces
Statistical Methods
Data Mining Algorithms
Multidimensional Data
6Contributions
- Graphics, Ranking, and Interaction for Discovery
(GRID) principles - Rank-by-Feature Framework
- The design and implementation of the Hierarchical
Clustering Explorer (HCE) - Validation through case studies and user surveys
7Hierarchical Clustering ExplorerUnderstanding
Clusters Through Interactive Exploration
- Overview of the entire clustering results ?
compressed overview - The right number of clusters ? minimum
similarity bar - Overall pattern of each cluster (aggregation)
? detail cutoff bar - Compare two results ? brushing and linking
using pair-tree
8HCE History
- Document-View Architecture
- 72,274 lines of C codes, 76 C classes
- About 2,500 downloads since April 2002
- Commercial license to a biotech company
(www.vialactia.com) - Freely downloadable at www.cs.umd.edu/hcil/hce
9Goal Find Interesting Features in
Multidimensional Data
- Finding clusters, outliers, correlations, gaps,
is difficult in multidimensional data - Cognitive difficulties in gt3D
- Therefore utilize low-dimensional projections
- Perceptual efficiency in 1D and 2D
- Orderly process to guide discovery
10Do you see anything interesting?
11Do you see any interesting feature?
12CorrelationWhat else?
13Outliers
He
Rn
14GRID Principles
- Graphics, Ranking, and Interaction for Discovery
in Multidimensional Data -
- study 1D
- study 2D
- then find features
- ranking guides insight
- statistics confirm
15(No Transcript)
16Rank-by-Feature Framework
- Based on the GRID principles
- 1D ? 2D
- 1D Histogram Boxplot
- 2D Scatterplot
- Ranking Criteria
- statistical methods
- data mining algorithms
- Graphical Overview
- Rapid Interactive Browsing
17Demo
A Ranking Example
3138 U.S. counties with 17 attributes
Uniformness (entropy) (6.7, 6.1, 4.5, 1.5)
Pearson correlation (0.996, 0.31, 0.01, -0.69)
18Categorical Variables in RFF
- New ranking criteria
- Chi-square, ANOVA
- Significance and Strength
- How strong is a relationship?
- How significant is a relationship?
- Partitioning and Comparison
- partition by a column (categorical variable)
- partition by a row (class info for columns)
- compare clustering results for partitions
19color Contingency coefficient C size
Chi-square p-value
color Quadracity size Least-square
error
20Categorical Variables in RFF
- New ranking criteria
- Chi-square, ANOVA
- Significance and Strength
- How strong is a relationship?
- How significant is a relationship?
- Partitioning and Comparison
- partition by a column (categorical variable)
- partition by a row (class info for columns)
- compare clustering results for partitions
21Partitioning and Comparison
s1 s2 s3 s4 s5 s6 s7
FieldType integer integer real integer integer integer categorical
i1 M
i2 M
i3 M
in-1 F
in F
- Compare two column-clustering results
22Partitioning and Comparison
s1 s2 s3 s4 s5 s6
CID 1 1 1 2 2 2
FieldType integer integer real integer integer integer
i1
i2
i3
in-1
in
- Compare two row-clustering results
23Qualitative Evaluation
- Case studies
- 30-minute weekly meeting for 6 weeks individually
- observe how participants use HCE
- improve HCE according to their requirements
- 1 molecular biologist (Acute lung injuries in
mice) - 1 biostatistician (FAMuSS Study data)
- 1 meteorologist (Aerosol measurement)
24Lessons Learned
- Rank-by-Feature Framework
- Enables systematic/orderly exploration
- Prevents from missing important features
- Helps confirm known features
- Helps identify unknown features
- Reveals outliers as signal/noise
- More work needed
- Transformation of variables
- More ranking criteria
- More interactions
25User Survey via Emails
- 1500 user survey emails
- 13 questions on HCE and RFF
- 60 successfully sent out
- 85 users replied
- 60 users answered a majority of questions
- 25 just curious users
26Which features have you used?
Do you think HCE improved the way you analyze
your data set?
27Future Work
- Integrating RFF with Other Tools
- More ranking criteria
- GRID principles available in other tools
- Scaling-up
- Selection/Filtering to handle large number of
dimensions - Interaction in RFF
- Further Evaluation
28(No Transcript)
29Future Work
- Integrating RFF with Other Tools
- More ranking criteria
- GRID principles available in other tools
- Scaling-up
- Selection/Filtering to handle large number of
dimensions - Interaction in RFF
- Further Evaluation
30Contributions
- Graphics, Ranking, and Interaction for Discovery
(GRID) principles - Rank-by-Feature Framework
- The design and implementation of the Hierarchical
Clustering Explorer (HCE) - Validation through case studies and user surveys
31Thank you !