SBD: Usability Evaluation - PowerPoint PPT Presentation

About This Presentation
Title:

SBD: Usability Evaluation

Description:

SBD: Usability Evaluation Chris North CS 3724: HCI – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 51
Provided by: ChrisN162
Category:

less

Transcript and Presenter's Notes

Title: SBD: Usability Evaluation


1
SBDUsability Evaluation
  • Chris North
  • CS 3724 HCI

2
ANALYZE
claims about current practice
analysis of stakeholders, field studies
Problem scenarios
Scenario-Based Design
DESIGN
Activity scenarios
iterative analysis of usability claims
and re-design
metaphors, information technology, HCI
theory, guidelines
Information scenarios
Interaction scenarios
PROTOTYPE EVALUATE
summative evaluation
formative evaluation
Usability specifications
3
Evaluation
  • Formative vs. Summative
  • Analytic vs. Empirical

4
Usability Engineering
Reqs Analysis
Design
Evaluate
Develop
many iterations
5
Usability Engineering
Formative evaluation
Summative evaluation
6
Usability Evaluation
  • Analytic Methods
  • Usability inspection, Expert review
  • Heuristic Nielsens 10
  • Cognitive walk-through
  • GOMS analysis
  • Empirical Methods
  • Usability Testing
  • Field or lab
  • Observation, problem identification
  • Controlled Experiment
  • Formal controlled scientific experiment
  • Comparisons, statistical analysis

7
User Interface Metrics
  • Ease of learning
  • Ease of use
  • User satisfaction

8
User Interface Metrics
  • Ease of learning
  • learning time,
  • Ease of use
  • performance time, error rates
  • User satisfaction
  • surveys
  • Not user friendly

9
Usability Testing
10
Usability Testing
  • Formative helps guide design
  • Early in design process
  • when architecture is finalized, then its too
    late!
  • Small of users
  • Usability problems, incidents
  • Qualitative feedback from users
  • Quantitative usability specification

11
Usability Specification Table
metrics
Benchmark task Worst case Planned Target Best case (expert) Observed
Find most expensive house for sale? 1 min. 10 sec. 3 sec. ??? sec
  • e.g. frequent tasks should be fast

12
Usability Test Setup
  • Set of benchmark tasks
  • Derived from scenarios (Reqs analysis phase)
  • Derived from claims analysis (Design phase)
  • Easy to hard, specific to open-ended
  • Coverage of different UI features
  • E.g. Find the 5 most expensive houses for sale
  • Different types learnability vs. performance
  • Consent forms
  • Not needed unless recording users face/voice
    (new rule)
  • Experimenters
  • Facilitator instructs user
  • Observers take notes, collect data, video tape
    screen
  • Executor run the prototype for faked parts
  • Users
  • Solicit from target user community (Reqs
    analysis)
  • 3-5 users, quality not quantity

13
Usability Test Procedure
  • Goal mimic real life
  • Do not cheat by helping them complete tasks
  • Initial instructions
  • We are evaluating the system, not you.
  • Repeat
  • Give user next benchmark task
  • Ask user to think aloud
  • Observe, note mistakes and problems
  • Avoid interfering, hint only if completely stuck
  • Interview
  • Verbal feedback
  • Questionnaire
  • 1 hour / user

14
Usability Lab
  • E.g. McBryde 102

15
Data
  • Note taking
  • E.g. _at_ user keeps clicking on the wrong
    button
  • Verbal protocol think aloud
  • E.g. user thinks that button does something else
  • Rough quantitative measures
  • HCI metrics e.g. task completion time,
  • Interview feedback and surveys
  • Video-tape screen mouse
  • Eye tracking, biometrics?

16
Analyze
  • Initial reaction
  • stupid user!, thats developer Xs fault!,
    this sucks
  • Mature reaction
  • how can we redesign UI to solve that usability
    problem?
  • the data is always right
  • Identify usability problems
  • Learning issues e.g. cant figure out or didnt
    notice feature
  • Performance issues e.g. arduous, tiring to
    solve tasks
  • Subjective issues e.g. annoying, ugly
  • Problem severity critical vs. minor

17
Cost-Importance Analysis
  • Importance 1-5 (task effect, frequency)
  • 5 critical, major impact on user, frequent
    occurance
  • 3 user can complete task, but with difficulty
  • 1 minor problem, small speed bump, infrequent
  • Ratio importance / cost
  • Sort by this, highest to lowest
  • 3 categories Must fix, next version, ignored

Problem Importance Solutions Cost Ratio I/C

18
Refine UI
  • Solve problems in order of importance/cost
  • Simple solutions vs. major redesigns
  • Iterate
  • Test, refine, test, refine, test, refine,
  • Until?

19
Refine UI
  • Solve problems in order of importance/cost
  • Simple solutions vs. major redesigns
  • Iterate
  • Test, refine, test, refine, test, refine,
  • Until? Meets usability specification

20
Examples
  • Learnability problem
  • Problem user didnt know he could zoom in to
    see more
  • Potential solutions
  • Better labeling Better zoom button icon,
    tooltip
  • Clearer affordance Add a zoom bar slider (like
    google maps)
  • NOT more help documentation! You can do
    better.
  • Performance problem
  • Problem user took too long to repeatedly zoom
    in
  • Potential solutions
  • Faster affordance Add a real-time zoom bar
  • Shortcuts Icons for each zoom level state,
    city, street

21
Project (step 6) Usability Test
  • Usability Evaluation
  • gt3 users Not (tainted) HCI students
  • 10 benchmark tasks
  • Simple data collection (Biometrics optional!)
  • Exploit this opportunity to improve your design
  • Report
  • Procedure (users, tasks, specs, data collection)
  • Usability problems identified, specs not met
  • Design modifications

22
Controlled Experiments
23
Usability test vs. Controlled Expm.
  • Usability test
  • Formative helps guide design
  • Single UI, early in design process
  • Few users
  • Usability problems, incidents
  • Qualitative feedback from users
  • Controlled experiment
  • Summative measure final result
  • Compare multiple UIs
  • Many users, strict protocol
  • Independent dependent variables
  • Quantitative results, statistical significance

Engineering oriented
Science oriented
24
What is Science?

25
What is Science?
Phenomenon
Engineering
Measurement
Science
Modeling
26
Scientific Method?

27
Scientific Method
  • Form Hypothesis
  • Collect data
  • Analyze
  • Accept/reject hypothesis
  • How to prove a hypothesis in science?

28
Scientific Method
  • Form Hypothesis
  • Collect data
  • Analyze
  • Accept/reject hypothesis
  • How to prove a hypothesis in science?
  • Easier to disprove things, by counterexample
  • Null hypothesis opposite of hypothesis
  • Disprove null hypothesis
  • Hence, hypothesis is proved

29
Example
  • Typical question
  • Which visualization is better for which user
    tasks?
  • Spotfire vs. TableLens

30
Cause and Effect
  • Goal determine cause and effect
  • Cause visualization tool (Spotfire vs.
    TableLens)
  • Effect user performance time on task T
  • Procedure
  • Vary cause
  • Measure effect
  • Problem random variation
  • Cause vis tool OR random variation?

random variation
Realworld
Collecteddata
uncertain conclusions
31
Stats to the Rescue
  • Goal
  • Measured effect unlikely to result by random
    variation
  • Hypothesis
  • Cause visualization tool (e.g. Spotfire ?
    TableLens)
  • Null hypothesis
  • Visualization tool has no effect (e.g. Spotfire
    TableLens)
  • Hence Cause random variation
  • Stats
  • If null hypothesis true, then measured effect
    occurs with probability lt 5 (e.g. measured
    effect gtgt random variation)
  • Hence
  • Null hypothesis unlikely to be true
  • Hence, hypothesis likely to be true

32
Variables
  • Independent Variables (what you vary), and
    treatments (the variable values)
  • Visualization tool
  • Spotfire, TableLens, Excel
  • Task type
  • Find, count, pattern, compare
  • Data size ( of items)
  • 100, 1000, 1000000
  • Dependent Variables (what you measure)
  • User performance time
  • Errors
  • Subjective satisfaction (survey)
  • HCI metrics

33
Example 2 x 3 design
  • n users per cell

Ind Var 2 Task Type
Task1 Task2 Task3
Spot-fire
Table-Lens
Ind Var 1 Vis. Tool
Measured user performance times (dep var)
34
Groups
  • Between subjects variable
  • 1 group of users for each variable treatment
  • Group 1 20 users, Spotfire
  • Group 2 20 users, TableLens
  • Total 40 users, 20 per cell
  • With-in subjects (repeated) variable
  • All users perform all treatments
  • Counter-balancing order effect
  • Group 1 20 users, Spotfire then TableLens
  • Group 2 20 users, TableLens then Spotfire
  • Total 40 users, 40 per cell

35
Issues
  • Eliminate or measure extraneous factors
  • Randomized
  • Fairness
  • Identical procedures,
  • Bias
  • User privacy, data security
  • IRB (internal review board)

36
Procedure
  • For each user
  • Sign legal forms
  • Pre-Survey demographics
  • Instructions
  • Do not reveal true purpose of experiment
  • Training runs
  • Actual runs
  • Give task
  • measure performance
  • Post-Survey subjective measures
  • n users

37
Data
  • Measured dependent variables
  • Spreadsheet

User Spotfire Spotfire Spotfire TableLens TableLens TableLens
User task 1 task 2 task 3 task 1 task 2 task 3
user1 12 s 45 104 13 51 138
user2 16 38 97 10 48 116

38
Step 1 Visualize it
  • Dig out interesting facts
  • Qualitative conclusions
  • Guide stats
  • Guide future experiments

39
Step 2 Stats
Ind Var 2 Task Type
Task1 Task2 Task3
Spot-fire 37.2 54.5 103.7
Table-Lens 29.8 53.2 145.4
Ind Var 1 Vis. Tool
Average user performance times (dep var)
40
TableLens better than Spotfire?
  • Problem with Averages?

Avg Perf time (secs)
Spotfire TableLens
41
TableLens better than Spotfire?
  • Problem with Averages? lossy
  • Compares only 2 numbers
  • What about the 40 data values? (Show me the
    data!)

Avg Perf time (secs)
Spotfire TableLens
42
The real picture
  • Need stats that compare all data
  • What if all users were 1 sec faster on TableLens?
  • What if only 1 user was 20 sec faster on
    TableLens?

Avg Perf time (secs)
Spotfire TableLens
43
Statistics
  • t-test
  • Compares 1 dep var on 2 treatments of 1 ind var
  • ANOVA Analysis of Variance
  • Compares 1 dep var on n treatments of m ind vars
  • Result
  • p probability that difference between
    treatments is random (null hypothesis)
  • statistical significance level
  • typical cut-off p lt 0.05
  • Hypothesis confidence 1 - p

44
In Excel
45
p lt 0.05
  • Woohoo!
  • Found a statistically significant difference
  • Averages determine which is better
  • Conclusion
  • Cause visualization tool (e.g. Spotfire ?
    TableLens)
  • Vis Tool has an effect on user performance for
    task T
  • 95 confident that TableLens better than
    Spotfire
  • NOT TableLens beats Spotfire 95 of time
  • 5 chance of being wrong!
  • Be careful about generalizing

46
p gt 0.05
  • Hence, no difference?
  • Vis Tool has no effect on user performance for
    task T?
  • Spotfire TableLens ?

47
p gt 0.05
  • Hence, no difference?
  • Vis Tool has no effect on user performance for
    task T?
  • Spotfire TableLens ?
  • NOT!
  • Did not detect a difference, but could still be
    different
  • Potential real effect did not overcome random
    variation
  • Provides evidence for Spotfire TableLens, but
    not proof
  • Boring, basically found nothing
  • How?
  • Not enough users
  • Need better tasks, data,

48
Data Mountain
  • Robertson, Data Mountain (Microsoft)

49
Data Mountain Experiment
  • Data Mountain vs. IE favorites
  • 32 subjects
  • Organize 100 pages, then retrieve based on cues
  • Indep. Vars
  • UI Data mountain (old, new), IE
  • Cue Title, Summary, Thumbnail, all 3
  • Dependent variables
  • User performance time
  • Error rates wrong pages, failed to find in 2
    min
  • Subjective ratings

50
Data Mountain Results
  • Spatial Memory!
  • Limited scalability?
Write a Comment
User Comments (0)
About PowerShow.com