Multivariate Data - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

Multivariate Data

Description:

brown black blonde red. 2.9 3.7 3.4 2.1. Spring 2005. CS 7450. 9. Example. Baseball. statistics ... Cute applets: Spring 2005. CS 7450. 44. Star Plots. Var 1 ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 65
Provided by: JohnSt8
Category:

less

Transcript and Presenter's Notes

Title: Multivariate Data


1
Multivariate Data Representations
  • CS 7450 - Information Visualization
  • Jan. 20, 2005
  • John Stasko

2
Agenda
  • Data forms and representations
  • Basic representation techniques
  • Multivariate (gt3) techniques

3
Data Sets
  • Data comes in many different forms
  • Typically, not in the way you want it
  • How is stored (in the raw)?

4
Example
  • Cars
  • make
  • model
  • year
  • miles per gallon
  • cost
  • number of cylinders
  • weights
  • ...

5
Example
  • Web pages

6
Data Tables
  • Often, we take raw data and transform it into a
    form that is more workable
  • Main idea
  • Individual items are called cases
  • Cases have variables (attributes)

7
Data Table Format
Case1 Case2 Case3 ...
Variable1 Variable2 Variable3 ...
Value11 Value21 Value31
Value12 Value22 Value32
Dimensions
Value13 Value23 Value33
Think of as a function f(case1) ltVal11, Val12,gt
8
Example
Mary Jim Sally Mitch
...
SSN Age Hair GPA ...
145 294 563 823
23 17 47 29
brown black blonde red
2.9 3.7 3.4 2.1
People in class
9
Example
Baseballstatistics
10
Variable Types
  • Three main types of variables
  • N-Nominal (equal or not equal to other values)
  • Example gender
  • O-Ordinal (obeys lt relation, ordered set)
  • Example fr,so,jr,sr
  • Q-Quantitative (can do math on them)
  • Example age

11
Metadata
  • Descriptive information about the data
  • Might be something as simple as the type of a
    variable, or could be more complex
  • For times when the table itself just isnt enough
  • Example if variable1 is l, then variable3 can
    only be 3, 7 or 16

12
How Many Variables?
  • Data sets of dimensions 1, 2, 3 are common
  • Number of variables per class
  • 1 - Univariate data
  • 2 - Bivariate data
  • 3 - Trivariate data
  • gt3 - Hypervariate data

13
Representation
  • Whats a common way of visually representing
    multivariate data sets?
  • Graphs!

14
Good Example
www.nationmaster.com
15
Basic Symbolic Displays
  • Graphs ?
  • Charts
  • Maps
  • Diagrams

From S. Kosslyn, Understanding chartsand
graphs, Applied CognitivePsychology, 1989.
16
1. Graph
Showing the relationships between
variablesvalues in a data table
17
Properties
  • Graph
  • Visual display that illustrates one or more
    relationships among entities
  • Shorthand way to present information
  • Allows a trend, pattern or comparison to be
    easily comprehended

18
Issues
  • Critical to remain task-centric
  • Why do you need a graph?
  • What questions are being answered?
  • What data is needed to answer those questions?
  • Who is the audience?

money
time
19
Graph Components
  • Framework
  • Measurement types, scale
  • Content
  • Marks, lines, points
  • Labels
  • Title, axes, ticks

20
Other Symbolic Displays
  • Chart
  • Map
  • Diagram

Aside
21
2. Chart
  • Structure is important, relates entities to each
    other
  • Primarily uses lines, enclosure, position to
    link entities

Examples flowchart, family tree, org chart, ...
22
3. Map
  • Representation of spatial relations
  • Locations identified by labels

23
Choropleth Map
Areas are filled and colored differently
to indicate some attribute of that region
24
Cartography
  • Cartographers and map-makers have a wealth of
    knowledge about the design and creation of visual
    information artifacts
  • Labeling, color, layout,
  • Information visualization researchers should
    learn from this older, existing area

25
4. Diagram
  • Schematic picture of object or entity
  • Parts are symbolic

Examples figures, steps in a manual,
illustrations,...
26
Details
  • What are the constituent pieces of these four
    symbolic displays?
  • What are the building blocks?

27
Visual Structures
  • Composed of
  • Spatial substrate
  • Marks
  • Graphical properties of marks

28
Space
  • Visually dominant
  • Often put axes on space to assist
  • Use techniques of composition, alignment,
    folding, recursion, overloading to 1)
    increase use of space 2) do data encodings

29
Marks
  • Things that occur in space
  • Points
  • Lines
  • Areas
  • Volumes

30
Graphical Properties
  • Size, shape, color, orientation...

Spatial properties
Object properties
Position Size
Expressing extent
Grayscale
Color Shape Texture
Differentiating marks
Orientation
31
Intermission
  • Getting slides
  • Getting papers
  • Photos

32
Back to Data
  • What were the different types of data sets?
  • Number of variables per class
  • 1 - Univariate data
  • 2 - Bivariate data
  • 3 - Trivariate data
  • gt3 - Hypervariate data

33
Univariate Data
  • Representations

Bill
7 5 3 1
Tukey box plot
Middle 50
low
high
Mean
0
20
34
What goes where
  • In univariate representations, we often think of
    the data case as being shown along one dimension,
    and the value in another

Line graph
Bar graph
Y-axis is quantitativevariable Compare relative
pointvalues
Y-axis is quantitativevariable See changes
overconsecutive values
35
Alternative View
  • We may think of graph as representing independent
    (data case) and dependent (value) variables
  • Guideline
  • Independent vs. dependent variables
  • Put independent on x-axis
  • See resultant dependent variables along y-axis

36
Bivariate Data
  • Representations

Scatter plot is common
price
Two variables, want tosee relationship Is there
a linear, curved orrandom pattern?
mileage
Each mark is nowa data case
37
Trivariate Data
  • Representations

3D scatter plot is possible
price
horsepower
mileage
38
Alternative Representation
Still use 2D but havemark propertyrepresent
thirdvariable
39
Alternative Representation
Represent each variablein its own explicit way
40
Hypervariate Data
  • Ahhh, the tough one
  • Number of well-known visualization techniques
    exist for data sets of 1-3 dimensions
  • line graphs, bar graphs, scatter plots OK
  • We see a 3-D world (4-D with time)
  • What about data sets with more than 3 variables?
  • Often the interesting, challenging ones

41
Multiple Views
Give each variable its own display
1
A B C D E 1 4 1 8 3 5 2 6 3 4 2 1 3 5 7 2 4 3 4
2 6 3 1 5
2
3
4
A B C D E
42
Scatterplot Matrix
Represent each possible pair of variables in
their own 2-D scatterplot Useful for
what? Misses what?
43
Chernoff Faces
Encode different variables values in
characteristics of human face
http//www.cs.uchicago.edu/wiseman/chernoff/ http
//hesketh.com/schampeo/projects/Faces/chernoff.ht
ml
Cute applets
44
Star Plots
Var 1
Space out the n variables at equal angles around
a circle Each spoke encodes a variables value
Var 2
Var 5
Value
Var 3
Var 4
45
Star Plot examples
http//seamonkey.ed.asu.edu/behrens/asu/reports/c
ompre/comp1.html
46
Star Coordinates
E. Kandogan, Star Coordinates A
Multi-dimensional Visualization Technique with
Uniform Treatment of Dimensions, InfoVis
2000 Late-Breaking Hot Topics, Oct. 2000
Demo
47
Parallel Coordinates
  • What are they?
  • Explain

48
Parallel Coordinates
Encode variables along a horizontal row Vertical
line specifies values
V1 V2 V3 V4 V5
49
Parallel Coords Example
Basic
Grayscale
Color
50
Application
  • System that uses parallel coordinates for
    information analysis and discovery
  • Interactive tool
  • Can focus on certain data items
  • Color

Taken from A. Inselberg, Multidimensional
DetectiveInfoVis 97, 1997.
51
Discuss
  • What was their domain?
  • What was their problem?
  • What were their data sets?

52
The Problem
  • VLSI chip manufacture
  • Want high quality chips (high speed) and a high
    yield batch ( of useful chips)
  • Able to track defects
  • Hypothesis No defects gives desired chip types
  • 473 batches of data

53
The Data
  • 16 variables
  • X1 - yield
  • X2 - quality
  • X3-X12 - defects (inverted)
  • X13-X16 - physical parameters

54
Parallel Coordinate Display
yield quality
defects
parameters
Yikes! But not that bad
Distributions x1 - normal x2 - bipolar
55
Top Yield Quality
split
defects
Have some defects
56
Minimal Defects
Not thehighestyields andquality
57
Best Yields
Appears that some defects are necessary to
produce the best chips Non-intuitive!
58
Xmdv
Toolsuite created by Matthew Ward of
WPI Includes parallel coordinate views
Demo
59
Parallel Coordinate Tree
Demo
D. Brodbeck and L. Girardin, "Visualization of
Large-Scale Customer Satisfaction Surveys Using a
Parallel Coordinate Tree", InfoVis 03.
60
Parallel Coordinates
  • Technique
  • Strengths?
  • Weaknesses?

61
Sliding Rods
T. Lanning, K. Wittenburg, et al,
"Multidimensional Information Visualization
through Sliding Rods", Proceedings of AVI 2000
62
Administratia
  • Computer accounts
  • HW 1 in today
  • HW 3 due Tuesday

63
Upcoming
  • Multivariate vis tools
  • Reading
  • Eick paper
  • Visual perception
  • Tufte (please be reading)

64
Sources Used
CMS book Referenced articles Marti Hearst SIMS
247 lectures Kosslyn 89 article A. Marcus,
Graphic Design for Electronic Documents and
User Interfaces M. Monmonier, How to Lie with
Maps W. Cleveland, The Elements of Graphing
Data C. H. Yu, Visualization Techniques of
Different Dimensions http//seamonkey.ed.asu.edu/
behrens/asu/reports/compre/comp1.html http//www.c
sc.ncsu.edu/faculty/healey/PP/PP.html
Write a Comment
User Comments (0)
About PowerShow.com