Multivariate Data Sets - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Multivariate Data Sets

Description:

brown black blonde red. 2.9 3.7 3.4 2.1. Spring 2002. CS 7450. 8. Example. Baseball. statistics ... Cute applets: Spring 2002. CS 7450. 19. Star Plots. Var 1 ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 51
Provided by: JohnS3
Category:

less

Transcript and Presenter's Notes

Title: Multivariate Data Sets


1
Multivariate Data Sets
  • CS 7450 - Information Visualization
  • Jan. 15, 2002
  • John Stasko

2
Data Sets
  • Data comes in many different forms
  • Typically, not in the way you want it
  • How is stored (in the raw)?

3
Example
  • Cars
  • make
  • model
  • year
  • miles per gallon
  • cost
  • number of cylinders
  • weights
  • ...

4
Example
  • Web pages

5
Data Tables
  • Often, we take raw data and transform it into a
    form that is more workable
  • Main idea
  • Individual items are called cases
  • Cases have variables (attributes)

6
Data Table Format
Case1 Case2 Case3 ...
Variable1 Variable2 Variable3 ...
Value11 Value21 Value31
Value12 Value22 Value32
Dimensions
Value13 Value23 Value33
Think of as a function f(case1) ltVal11, Val12,gt
7
Example
Mary Jim Sally Mitch
...
SSN Age Hair GPA ...
145 294 563 823
23 17 47 29
brown black blonde red
2.9 3.7 3.4 2.1
People in class
8
Example
Baseballstatistics
9
Variable Types
  • Three main types of variables
  • N-Nominal (equal or not equal to other values)
  • Example gender
  • O-Ordinal (obeys lt relation, ordered set)
  • Example fr,so,jr,sr
  • Q-Quantitative (can do math on them)
  • Example age

10
Metadata
  • Descriptive information about the data
  • Might be something as simple as the type of a
    variable, or could be more complex
  • For times when the table itself just isnt enough
  • Example if variable1 is l, then variable3 can
    only be 3, 7 or 16

11
How Many Variables?
  • Data sets of dimensions 1,2,3 are common
  • Number of variables per class
  • 1 - Univariate data
  • 2 - Bivariate data
  • 3 - Trivariate data
  • gt3 - Hypervariate data

12
Univariate Data
  • Representations

Bill
7 5 3 1
Tukey box plot
Middle 50
low
high
Mean
0
20
13
Bivariate Data
  • Representations

Scatter plot is common
price
mileage
14
Trivariate Data
  • Representations

3D scatter plot is possible
price
horsepower
mileage
15
Hypervariate Data
  • Number of well-known visualization techniques
    exist for data sets of 1-3 dimensions
  • line graphs, bar graphs, scatter plots OK
  • We see a 3-D world (4-D with time)
  • What about data sets with more than 3 variables?
  • Often the interesting ones

16
Multiple Views
Give each variable its own display
1
A B C D E 1 4 1 8 3 5 2 6 3 4 2 1 3 5 7 2 4 3 4
2 6 3 1 5
2
3
4
A B C D E
17
Scatterplot Matrix
Represent each possible pair of variables in
their own 2-D scatterplot Useful for
what? Misses what?
18
Chernoff Faces
Encode different variables values in
characteristics of human face
http//www.cs.uchicago.edu/wiseman/chernoff/ http
//hesketh.com/schampeo/projects/Faces/chernoff.ht
ml
Cute applets
19
Star Plots
Var 1
Space out the n variables at equal angles around
a circle Each spoke encodes a variables value
Var 2
Var 5
Value
Var 3
Var 4
20
Star Plot examples
http//seamonkey.ed.asu.edu/behrens/asu/reports/c
ompre/comp1.html
21
Star Coordinates
E. Kandogan, Star Coordinates A
Multi-dimensional Visualization Technique with
Uniform Treatment of Dimensions, InfoVis
2000 Late-Breaking Hot Topics, Oct. 2000
Demo
22
Intermission
  • Missing students
  • Learn names
  • Computer accounts

23
Parallel Coordinates
Encode variables along a horizontal row Vertical
line specifies values
V1 V2 V3 V4 V5
24
Parallel Coords Example
Basic
Grayscale
Color
25
Application
  • System that uses parallel coordinates for
    information analysis and discovery
  • Interactive tool
  • Can focus on certain data items
  • Color

Taken from A. Inselberg, Multidimensional
Detective InfoVis 97, 1997.
26
The Problem
  • VLSI chip manufacture
  • Want high quality chips (high speed) and a high
    yield batch ( of useful chips)
  • Able to track defects
  • Hypothesis No defects gives desired chip types
  • 473 batches of data

27
The Data
  • 16 variables
  • X1 - yield
  • X2 - quality
  • X3-X12 - defects (inverted)
  • X13-X16 - physical parameters

28
Parallel Coordinate Display
yield quality
defects
parameters
Yikes! But not that bad
Distributions x1 - normal x2 - bipolar
29
Top Yield Quality
split
defects
Have some defects
30
Minimal Defects
Not thehighestyields andquality
31
Best Yields
Appears that some defects are necessary to
produce the best chips Non-intuitive!
32
Another Problem
  • Data concerning economic output of a country
    (fishing, mining, etc.)
  • Eight variables
  • Fit a model to the data set
  • Model describes possible economic outputs

33
Parallel Coordinates
Model boundary
Pick a value
Model boundary
34
Xmdv
Toolsuite created by Matthew Ward of
WPI Includes parallel coordinate views
Demo
35
Dimensional Anchors
Attempt to unify many different multi-var
vis techniques Uses 9 DA parameters
P. Hoffman, G. Grinstein, D. Pinkney, Dimensional
Anchors A Graphic Primitive for
Multidimensional Multivariate Information
Visualizations, Workshop on New Paradigms in
Info Vis, Nov. 1999.
One example display
36
Another Technique
  • Database of data items, each of n dimensions
  • Issue a query that specifies a target value of
    the dimensions
  • Often get back no exact matches
  • Want to find near matches

Taken from D. Keim, H-P Kriegel, VisDB Database
Exploration Using Multid Vis, IEEE CGA, 1994.
37
Relevance Factor
  • How close an item is to the query
  • Data items have some value that can be
    numerically quantified
  • Each dimension is some distance away
  • from query item
  • Sum these up for total distance
  • Relevance is inverse of distance

38
Example
  • 5 dimensions, integers 0-gt255
  • Query 6, 210, 73, 45, 92
  • Data item 8, 200, 73, 50, 91
  • Distance 2 10 0 5 1 18
  • Relevance 1275 - 18 1267

39
Issues
  • What if dimensions are real numbers or text
    strings?
  • What if theyre the same type, but of different
    orders of magnitude?
  • Have to define some kind of distance, then a
    weight function to multiply by

40
Technique
  • Calculate relevance of all data points
  • Sort items based on relevance
  • Use spiral technique to order the values
  • Color items based on relevance

41
Relevance Colors
Low
High
Empirically established
42
Spiral Method
Highest relevancevalue in center,decreasing
valuesgrow outward
43
Display Methodology
Example five-dimensional data
Same itemappears insame placein each window
Totalrelevance
Dim 1
Dim 2
Spiral in eachwindow
Dim 3
Dim 4
Dim 5
Items ordered by total relevance
44
Figure from Paper
45
Example Display
46
Alternative
  • Grouping arrangement
  • Doesnt use multiple windows
  • Create all relevance dimensional depictions for
    an item and group them
  • Spiral out the different data items depictions

47
Grouping Arrangement
48
Example Display
8 dimensions
1000 items
Grouping
Multi-window
49
Sources Used
CMS book Referenced articles Marti Hearst SIMS
247 lectures C. H. Yu, Visualization Techniques
of Different Dimensions http//seamonkey.ed.asu
.edu/behrens/asu/reports/compre/comp1.html
50
Upcoming
  • Cognitive Tasks and Issues
  • Multivariate vis tools
Write a Comment
User Comments (0)
About PowerShow.com