More on Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 24, - PowerPoint PPT Presentation

About This Presentation
Title:

More on Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 24,

Description:

At most 6 curves on line chart. At most 10 bars on bar chart. At most 8 slices on pie chart ... Women in the Workforce. 1960. 1980. Lecture 14. Page 33. CS 239, ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 51
Provided by: PeterR92
Learn more at: https://lasr.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: More on Data Presentation CS 239 Experimental Methodologies for System Software Peter Reiher May 24,


1
More on Data Presentation CS 239Experimental
Methodologies for System SoftwarePeter
ReiherMay 24, 2007

2
Outline
  • Common graphics mistakes and games
  • Special purpose graphs

3
Common Mistakes in Graphics
  • Excess information
  • Multiple scales
  • Using symbols in place of text
  • Poor scales
  • Using lines incorrectly

4
Excess Information
  • Sneaky trick to meet length limits
  • Rules of thumb
  • At most 6 curves on line chart
  • At most 10 bars on bar chart
  • At most 8 slices on pie chart
  • But note that Tufte hates pie charts
  • Extract essence, dont cram things in

5
Way Too Much Information
6
Whats Important About That Chart?
  • Times for cp and rcp rise with number of replicas
  • Most other benchmarks are near constant
  • Exactly constant for rm

7
The Right Amountof Information
8
Multiple Scales
  • Another way to meet length limits
  • Basically, two graphs overlaid on each other
  • Confuses reader (which line goes with which
    scale?)
  • Misstates relationships
  • Implies equality of magnitude that doesnt exist

9
Some Especially Bad Multiple Scales
10
Using Symbolsin Place of Text
  • Graphics should be self-explanatory
  • Remember that the graphs often draw the reader in
  • So use explanatory text, not symbols
  • This means no Greek letters!
  • Unless your conference is in Athens...

11
Its All Greek To Me...
12
Explanation is Easy
13
Poor Scales
  • Plotting programs love non-zero origins
  • But people are used to zero
  • Fiddle with axis ranges (and logarithms) to get
    your message across
  • But dont lie or cheat
  • Sometimes trimming off high ends makes things
    clearer
  • Brings out low-end detail

14
Nonzero Origins(Chosen by Microsoft)
15
Proper Origins
16
A Poor Axis Range
17
A Logarithmic Range
Shows all data on chart - Minimizes differences
of non-outliers
18
A Truncated Range
Clarifies non-outlier distinctions - Makes
understanding outliers harder
19
Using Lines Incorrectly
  • Dont connect points unless interpolation is
    meaningful
  • Dont smooth lines that are based on samples
  • Exception fitted non-linear curves

20
Incorrect Line Usage
21
Pictorial Games
  • Usually intentional attempts to use graphics to
    deceive
  • Non-zero origins and broken scales
  • Double-whammy graphs
  • Omitting variation indices
  • Scaling by height, not area

22
Non-Zero Originsand Broken Scales
  • People expect (0,0) origins
  • Subconsciously
  • So non-zero origins are a great way to lie
  • Common in popular press
  • Also very common to cheat by omitting part of
    scale

23
Non-Zero Origins
24
The Three-Quarters Rule
  • Highest point should be 3/4 of scale or more

25
Double-Whammy Graphs
  • Put two related measures on same graph
  • One is (almost) function of other
  • Hits reader twice with same information
  • And thus overstates impact

26
OmittingVariation Description
  • Statistical data is inherently fuzzy
  • But means/medians/modes appear precise
  • Giving index of variation can make it clear
    theres no real difference
  • So liars and fools leave them out

27
Graph WithoutConfidence Intervals
28
Graph WithConfidence Intervals
29
Another Graph WithDifferent Confidence Intervals
30
Scaling by HeightInstead of Area
  • Clip art is popular with illustrators

Women in the Workforce
31
The Troublewith Height Scaling
  • Previous graph had heights of 21
  • But people perceive areas, not heights
  • So areas should be whats proportional to data
  • Tufte defines a lie factor size of effect in
    graphic divided by size of effect in data
  • Lie factor of 1.0 is the truth
  • Anything far from 1.0 is that degree of a lie
  • Not limited to area scaling
  • But especially insidious there (quadratic effect)

32
Scaling by Area
  • Heres the same graph with 21 area

Women in the Workforce
33
Poor Histogram Cell Size
  • Picking bucket size is always a problem
  • Prefer 5 or more observations per bucket
  • Choice of bucket size can affect results

34
Principles ofGraphics Integrity (Tufte)
  • Proportional representation of numbers
  • Clear, detailed, thorough labeling
  • Show data variation, not design variation
  • Use deflated money units
  • Dont have more dimensions than data has
  • Dont quote data out of context

35
Proportional Representation of Numbers
  • Maintain a lie factor of 1.0
  • Use areas, not heights, with clip art
  • Avoiding decorative graphs will do wonders
  • This isnt too hard for most engineers

36
Clear, Detailed,Thorough Labeling
  • Goal is to defeat distortion and ambiguity
  • Write explanations on graphic itself
  • Label important events in the data

37
Show Data Variation,Not Design Variation
  • Use one design for the entire graphic
  • In papers, try to use one design for all graphs
  • Again, artistic license is the big culprit

38
Use Deflated Money Units
  • Often necessary to show money over time
  • Even in computer science
  • E.g., price/performance over time
  • Or expected future cost of a disk
  • Nominal dollars are meaningless
  • Derate by some standard inflation measure
  • Thats what the WWW is for!

39
Might Need to Deflate Other Units
  • Depending on what youre doing, might need to
    deflate other units
  • E.g., transactions per second
  • Dont deflate if point in differences is the
    change in that rate over time
  • Must deflate if youre comparing other diffs,
    like parallel vs. sequential

40
Dont Have More Dimensions Than Data Has
  • This gets back to the Lie Factor
  • 1-D data (e.g., money) should occupy one
    dimension on the graph not
  • Clip art is prohibited by this rule
  • But if you have to, use an area measure

2.00
1.00
41
Dont Quote DataOut of Context
  • Tuftes example

42
The Same Data in Context
43
Special-Purpose Charts
  • Tukeys box plot
  • Histograms
  • Scatter plots
  • Gantt charts
  • Kiviat graphs

44
Tukeys Box Plot
  • Shows range, median, quartiles all in one
  • Tufte cant resist improvementsoror even
  • Not entirely clear to me if these really are
    better

minimum
maximum
quartile
quartile
median
45
Histograms
  • Tufte suggests various improvements

No y axis
No grid
Internal marker lines on bars
46
Scatter Plots
  • Useful in statistical analysis
  • Also excellent for huge quantities of data
  • Can show patterns otherwise invisible

47
Better Scatter Plots
  • Again, Tufte suggests improvements
  • But it can be a pain with automated tools
  • Better data-to-ink ratio
  • Can use modified Tukey box plot for axes

48
Gantt Charts
  • Shows relative duration of Boolean conditions
  • Arranged to make lines continuous
  • Each level after first follows FTTF pattern

49
Kiviat Graphs
  • Also called star charts or radar plots
  • Useful for looking at balance between HB and LB
    metrics

50
A Couple of Examples
  • A bad graph
  • A good graph

51
A Very Bad Graph
52
A Good Graph Sunspots
53
A Superb GraphDEC Traces
Write a Comment
User Comments (0)
About PowerShow.com