Evaluation - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Evaluation

Description:

'Isn't my new visualization cool?...' Fall 2000. CS 7450. 3. Evaluation ... Participants used XML3D and one of the other tools per session (vary order) ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 43
Provided by: JohnS3
Category:
Tags: evaluation

less

Transcript and Presenter's Notes

Title: Evaluation


1
Evaluation
  • CS 7450 - Information Visualization
  • November 9, 2000

2
Area Focus
  • Most of the research in InfoVis that weve
    learned about this semester has been the
    introduction of a new visualization technique or
    tool
  • Fisheyes, cone trees, hyperbolic displays,
    tilebars, themescapes, sunburst, jazz,
  • Isnt my new visualization cool?

3
Evaluation
  • How does one judge the quality of work?
  • Different measures
  • Impact on community as a whole, influential ideas
  • Assistance to people in the tasks they care about

4
Strong View
  • Unless a new technique or tool helps people in
    some kind of problem or task, it doesnt have any
    value

5
Broaden Thinking
  • Sometimes the chain of influence can be long and
    drawn out
  • System X influences System Y influences System Z
    which is incorporated into a practical tool that
    is of true value to people
  • This is what research is all about (typically)

6
Evaluation in HCI
  • Takes many different forms
  • Qualitative, quantitative, objective, subjective,
    controlled experiments, interpretive
    observations,
  • Which ones are best for evaluating InfoVis
    systems?

7
Controlled Experiments
  • Good for measuring performance or comparing
    multiple techniques
  • What do we measure?
  • Performance, time, errors,
  • Strengths, weaknesses?

8
Subjective Assessments
  • Find out peoples subjective views on tools
  • Was it enjoyable, confusing, fun, difficult, ?
  • This kind of personal judgment strongly influence
    use and adoption, sometimes even overcoming
    performance deficits

9
Qualitative, ObservationalStudies
  • Watch systems being used (you can learn a lot)
  • Is it being used in the way you expected?
  • Ecological validity
  • Can suggest new designs and improvements

10
Running Studies
  • Beyond our scope here
  • Take CS 6750-HCI and youll learn more about this

11
Confounds
  • Very difficult in InfoVis to compare apples to
    apples
  • UI can influence utility of visualization
    technique
  • Different tools were built to address different
    user tasks

12
Example
  • Lets design an experiment to compare the utility
    of Kohonen maps, VIBE and Themescapes in finding
    documents of interest...

13
Examples
  • Lets look at a couple example studies that
    attempt to evaluate different InfoVis systems
  • Both taken from good journal issue whose focus is
    Empirical Studies of Information Visualizations
  • International Journal of Human-Computer Studies,
    Nov. 2000, Vol. 53, No. 5

14
InfoVis for Web Content
  • Study compared three techniques for finding and
    accessing information within typical web
    information hierarchies
  • Windows Explorer style tool
  • Snap/Yahoo style category breakdown
  • 3D hyperbolic tree with 2D list view (XML3D)

Risden, Czerwinski, Munzner and Cook 00
15
XML3D
16
Snap
17
Folding Tree
18
Information Space
  • Took 12,000 node Snap hierarchy and ported it to
    2D tree and XML3D tools
  • Fast T1 connection

19
Hypothesis
  • Since XML3D has more information encoded it will
    provide better performance
  • But maybe 3D will throw people off

20
Methodology
  • 16 participants
  • Tasks broken out by
  • Old category vs. New category
  • One parent vs. Multiple parents
  • Participants used XML3D and one of the other
    tools per session (vary order)
  • Time to complete task measured, as well as
    judgment on quality of task response

21
Example Tasks
  • Old - one
  • Find the Lawnmower category
  • Old - multiple
  • Find photography category, then learn what
    different paths can take someone there
  • New - one
  • Create new Elementary Schools category and
    position appropriately
  • New - multiple
  • Create new category, position it, determine one
    other path to take people there

22
Results
  • General
  • Used ANOVA technique
  • No difference in two 2D tools so their data was
    combined

23
Results
  • Speed
  • Participants completed tasks faster with XML3D
    tool
  • Participants were faster on tasks with existing
    category, larger when a single parent was
    involved

24
Results
  • Consistency
  • No significant difference across all conditions
  • -gt Quality of placements, etc., was pretty much
    the same throughout

25
Results
  • Feature Usage
  • What aspect of XML3D tool was important?
  • Analyzed peoples use of parts of tool
  • 2D list elements - 43.9 of time
  • 3D graph - 32.5 of time

26
Results
  • Subjective ratings
  • Conventional 2D received slightly higher
    satisfaction rating, 4.85-4.5 out of 1-gt7
  • Not significant

27
Discussion
  • XML3D provides more focuscontext than the
    other two tools that may aid performance
  • Appeared that integration of 3D graph plus the 2D
    list view was important
  • Maybe new visualization techniques like this work
    best when coupled with more traditional displays

28
Space-Filling Hierarchy Views
  • Compare Treemap and Sunburst with users
    performing typical file/directory- related tasks
  • Evaluate task performance on both correctness and
    time

Stasko, Catrambone, Guzdial and McDonald 00
29
Tools Compared
Treemap
SunBurst
30
Hierarchies Used
  • Four in total
  • Used sample files and directories from our own
    systems (better than random)

Small Hierarchy (500 files)
Large Hierarchy (3000 files)
A
B
A
B
31
Methodology
  • 60 participants
  • Participant only works with a small or large
    hierarchy in a session
  • Training at start to learn tool
  • Vary order across participants

SB A, TM B TM A, SB B SB B, TM A TM B, SB A
32 on small hierarchies 28 on large hierarchies
32
Tasks
  • Identification (naming or pointing out) of a
    file based on size, specifically, the
    largest and second largest files (Questions 1-2)
  • Identification of a directory based on size,
    specifically, the largest (Q3)
  • Location (pointing out) of a file, given the
    entire path and name (Q4-7)
  • Location of a file, given only the file name
    (Q8-9)
  • Identification of the deepest subdirectory
    (Q10)
  • Identification of a directory containing files
    of a particular type (Q11)
  • Identification of a file based on type and size,
    specifically, the largest file of a
    particular type (Q12)
  • Comparison of two files by size (Q13)
  • Location of two duplicated directory structures
    (Q14)
  • Comparison of two directories by size (Q15)
  • Comparison of two directories by number of files
    contained (Q16)

33
Hypothesis
  • Treemap will be better for comparing file sizes
  • Uses more of the area
  • Sunburst would be better for searching files and
    understanding the structure
  • More explicit depiction of structure
  • Sunburst would be preferred overall

34
Try It Out
  • Conduct a couple example sessions

35
Small Hierarchy
Correct task completions (out of 16 possible)
36
Large Hierarchy
Correct task completions (out of 16 possible)
37
Performance Results
  • Ordering effect for Treemap on large hierarchies
  • Participants did better after seeing SB first
  • Performance was relatively mixed, trends favored
    Sunburst, but not clear-cut
  • Oodles of data!

38
Subjective Preferences
  • Subjective preferenceSB (51), TM (9), unsure
    (1)
  • People felt that TM was better for size tasks
    (not borne out by data)
  • People felt that SB better for determining which
    directories inside others
  • Identified it as being better for structure

39
Strategies
  • How a person searched for files etc. mattered
  • Jump out to total view, start looking
  • Go level by level

40
Summary
  • Why do evaluation of InfoVis systems?
  • We need to be sure that new techniques are really
    better than old ones
  • We need to know the strengths and weaknesses of
    each tool know when to use which tool

41
Challenges
  • There are no standard benchmark tests or
    methodologies to help guide researchers
  • Moreover, theres simply no one correct way to
    evaluate
  • Defining the tasks is crucial
  • Would be nice to have a good task taxonomy
  • Data sets used might influence results
  • What about individual differences?
  • Can you measure abilities (cognitive, visual,
    etc.) of participants?

42
References
  • All referred to papers
  • Martin and Mirchandani F 99 slides
Write a Comment
User Comments (0)
About PowerShow.com