Evaluation with Users - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Evaluation with Users

Description:

Heuristic evaluation (HE) is good, cheap. Tends not to catch the domain-specific stuff ... Flight attendant. Responsibility for comfort & safety of subjects ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 69
Provided by: johnk95
Category:
Tags: evaluation | users

less

Transcript and Presenter's Notes

Title: Evaluation with Users


1
Evaluation with Users
  • SEng 5115University of Minnesota
  • John Kruse
  • Spring 2008

2
Evaluation with usersThe Gold Standard
  • Big investment, big potential
  • Big investment is debatable
  • Potential isnt
  • Many issues
  • dealing with human subjects
  • which users? which tasks?
  • when in the process?
  • what to measure?
  • how to measure?

3
An early test
4
Why not just design
  • The Superbook story
  • Landauer, Trouble with Computers, at BellCore
  • Superbook was online version of 1,000s pages
    manual
  • First try wasnt better than paper
  • even though designed by knowledgeable experts
  • 2nd was better than paper, 3rd lots better

5
Why not just interview
  • People dont know what they know
  • People dont remember what they did
  • What people say about their work is different
    from what they actually do
  • Occasional mismatches between what people like
    and what works
  • People like what works on desktop
  • But no correlation on web

6
Why not just inspect
  • Heuristic evaluation (HE) is good, cheap
  • Tends not to catch the domain-specific stuff
  • Unless you have double experts (Domain and
    usability experts)
  • Tends not to have a task focus
  • What is good about a Web site
  • Users behavior is chaotic
  • HE and testing complement each other

7
Early in the process
  • Early is important
  • Low investment
  • Less inertia --gt time to change
  • Mock-ups and drawings are OK
  • issues in how to handle user choice
  • Partial prototypes when necessary

8
Early, on-the-fly prototyping
  • Paper prototypes Testing redesign
  • Revise during test session
  • Allow entire team to participate
  • building prototypes
  • watching users

9
Late in the process
  • Can measure productivity, timing, etc
  • May require more elaborate prototype
  • or actual code
  • Post-release usability sessions are useful
  • Observational or designed test
  • Flexibility of development organizations
    response is limited
  • Cost of fixing errors goes up
  • If late testing is the only testing is it worth
    it?

10
User testing as team building
  • Prototypes provide a medium for people to work
    together
  • User testing can be fun
  • Even if users are abusing your finest work
  • Make observers stay away (behind glass)
  • Managers have seen teams crystallize around paper
    prototyping user testing

11
What to measure 1 of 3
  • Feasibility of a product approach
  • Utility acceptance
  • Microwave Cakes
  • Ease of initial use learning
  • Intuitiveness
  • Need for manual
  • Icon interpretation
  • Problems, questions, reactions
  • What users are thinking

12
What to measure 2 of 3
  • Ease of remembering
  • Is it retained, or does it conflict with previous
    or interleaved experiences learning
  • Efficiency of Use / Productivity
  • Mostly later for usability measurement
  • Limited applicability early in design
  • Thinking Aloud related interrupts interfere
    with timing
  • But timing can be done early in some cases
  • Parts of workflow besides software

13
What to measure 3 of 3
  • Affective reactions
  • Do they like it?
  • Which parts do they like?
  • Measuring affect choice can be tricky
  • Observation during use is the fundamental method
  • Forced choice and ratings can help

14
Affective reactions
  • It may relate to self-evaluation perceived
    competency more than aesthetics
  • In task-oriented systems
  • On the Web, marketing considerations
  • If there are problems, then probe
  • The persons reactions to similar products
  • The background and experience of the person
  • Their expectations about the technology

15
Home page visual impact
  • 5-second test show users the home page
  • What does the owner do, what is the companys
    business
  • What attracted your attention? What did it mean,
    or how did it make you feel?
  • What can you do on this page?

16
Wizard of Oz
  • UI can be simple
  • All you need to do is envision the use
  • Smoke mirrors are quite adequate

3 books detected War Peace HOLD for M. Smith
17
Example Concept feasibility productivity
  • Library workstation with RFID
  • Study the utility of the device
  • Study acceptability of approach
  • Is multiple book at a time check-in better then
    one book at a time?
  • Wizard of Oz study
  • Visual Basic, Monitor, Speakers, Cardboard Box
  • Marked books
  • Experimenter

18
Library Wizard of Oz (cont)
  • Participants thought it was very realistic
  • Multiple was not better
  • Handling exceptions is disruptive
  • Discovered early cheaply
  • Continues to be a business goal

19
Productivity data
  • Specific measurements
  • Median/mean time for task
  • Comparison of alternatives
  • Focus on one type of test at a time
  • Think-aloud can slow down completion times

20
Experimental design
  • Most user testing is not rigorous hypothesis
    testing experimentation
  • has too low n
  • lacks good control conditions
  • Usually is formative evaluation
  • Summative evaluation
  • Usually experiments
  • Work out the statistics involved
  • Statistics cookbooks

21
Experimental design principles
  • Counterbalancing
  • Logically remove the possibility of competing
    explanations
  • If you want to study system X vs system Y
  • DO not have all your participants do X, then Y
  • Fatigue will decrease performance
  • Practice will increase it
  • You CANT really predict which will win

22
Testing usability of icons 1
  • 17 icons for UI, 4 sets created
  • Ease of learning
  • Show icons, ask what do you think
  • Present task/description, have user pick from
    entire set
  • Since icons are not seen in isolation
  • Present all names, all icons, have users match

23
Testing usability of icons 2
  • Efficiency
  • Users who had learned the icons
  • Given name, then timed on Y/N discrimination
  • Given random set, asked to click on specified
    icon
  • Subjective satisfaction
  • Rate each one easy -- difficult
  • Select preferred one from 4 alternatives

24
General User Test guidelines
  • Plan ahead of time
  • what data to record
  • what instructions to deliver
  • what to do if user falls off prototype
  • when to provide help, and what help
  • Know your objectives
  • but never lose sight of the user

25
General guidelines
  • Do a pilot study
  • Get professional help for big studies
  • In general, it is better if developers
    designers arent present during testing
  • too much bias
  • subtle clues
  • stay behind one-way glass

26
Documents for user testing
  • Observer briefing
  • Welcome for participants
  • Introduction
  • Informed consent
  • Training materials
  • Test task(s)
  • Data collection sheet
  • Data summary sheet
  • Data analysis sheet
  • Pre-test questionnaire
  • Post-test questionnaire

27
Tasks
  • Keep close to the real tasks
  • May need to shorten some for time reasons
  • Task selection heuristics
  • Common tasks
  • Areas of risk
  • Safety, New approaches, Uncertainties
  • Design the tasks
  • Iteratively
  • Get them reviewed

28
Test tasks 1
  • What you want them to accomplish
  • Typically not how
  • Example Create a form letter using MS Word,
    which results in Thank-you letters to 3
    recipients, each with individualized
  • Salutation (Dear Aunt Abigail)
  • Name of gift (necktie)
  • Attribute of gift (color)

29
Home Health Clinical Test tasks 1
  • Set up the new patient Jinnys visit calendar for
    a visit frequency of 3x Week for 2 Weeks
  • Locate todays visit schedule all patients to
    be seen today by a Ruth, a nurse
  • Initiate a visit note for Sam, an existing
    patient, from todays schedule
  • ...and define the discipline, program,
    activities type of services to be provided

30
User-generated tasks
  • Users behave differently when they care about the
    task
  • On the web, with eCommerce, lots of potential
    tasks
  • Interview them, let them define tasks that they
    can do with a web site.

Spool, User Interface Engineering
31
Observers
  • Better to be there
  • Than hear / read about it afterwards
  • Seeing video clips can be very persuasive
  • Better to be few unobtrusive
  • No reactions to users choices
  • No talking
  • Unless behind one-way mirror

Marketing mgr at a session, standing right behind
the user, saw a new design (two columns
reversed), and said aloud Amy, youve got it all
wrong Social situation use process to tell
this higher-ranking customer that it is not OK
to talk, and why. Amy had good reasons for
showing in this order (counterbalancing), and for
not biasing.
32
Users
  • Real users, as much as possible
  • If real users are scarce, try surrogates
  • If 3M people cant use it, then maybe its too
    hard
  • Availability of users might influence testing
    approach
  • Recruiting is non-trivial effort
  • Money always helps

33
Welcome, Orientation
  • Welcome, description of project
  • Description may be truncated
  • Brief intro to usability testing concept
  • Test is of software under real-world conditions
  • You wont help them
  • Unless necessary
  • Explanations afterwards
  • How long it will take

34
Participant (Human Subjects)
  • Remind them that you are not testing them
  • You are testing your own product
  • But tell them you would rather not help them
  • Informed, voluntary consent
  • Understand that they can quit at any time
  • Explain test in lay terms
  • Privacy anonymity, use of image/voice

35
Informed Consent
  • This is very important
  • Participant is volunteer
  • Can leave at any time
  • Is video being collected?
  • What will it be used for?
  • Who will see it?
  • If you want to record, get permission

36
Pre-test questionnaire
  • About users background
  • Experience with OS, software, etc.
  • Experience with job
  • How many hours per day do you use a computer?
  • What is your educational level (degrees,
    certificates, fields of study)?

37
Training Materials
  • What do they need to know?
  • What is real-world?
  • What will they get at work for training?
  • Dont just look at best, or worst-case
  • Instruct about software conventions
  • Task orientation
  • Explain things very thoroughly
  • Practice, demonstrate (?)

38
Thinking-Aloud Method 1
  • User asked to think-aloud
  • Running commentary, like at sports event
  • What they are thinking or looking for or trying
    to do
  • What they like, dont like, good, bad, anything
    is OK and helpful
  • They should guess, ask questions
  • You wont answer them

39
Thinking-Aloud Method 2
  • They should not let this interfere with their
    normal process
  • Generally, dont explain decisions, or make
    design suggestions, until after then its
    welcome
  • Have them practice it
  • Or have the first task be reasonably easy

40
How to help users
  • Not too soon
  • Encourage them to try things out
  • Be encouraging in general
  • Tell them they cant make a mistake
  • You will learn from everything they do or say
  • When to help
  • If they get stuck stay stuck
  • When they look upset

41
How to help users
  • Dont give answers if at all possible
  • Ask the user questions
  • General at first
  • Ones that will get them thinking about their
    conceptual model
  • Then more specific (Leading) questions
  • Give them hints
  • General at first, then more specific
  • Is their conceptual model OK?

42
Making users comfortable
  • Break after every task
  • Recap, offer drink break
  • Answer users questions if possible
  • Dont let users start designing
  • Until after users have completed their tasks

43
Pairs of participantsA thinking aloud variant
  • Thinking Aloud is difficult for people to do
  • Users can work in pairs
  • They talk to each other
  • This is more comfortable for them

44
Test facilitators role
  • Flight attendant
  • Responsibility for comfort safety of subjects
  • Prevent distress, embarrassment
  • Scientist
  • During
  • Maintain objectivity
  • Gather data
  • Before After
  • Plan
  • Reports

45
Data Collection Sheet
  • For quick observation, without recording where in
    program
  • Success, comments they made
  • Failure, kind of failure

46
Web usability testing - Data
  • Things happen fast
  • Abbreviate, or
  • Prepare checkoff sheets with likely actions
  • Prepare page miniatures, to make notes on

47
Special Materials - Greeking test
  • Or Mumble text
  • For layout
  • On web site
  • Greek all the text, but keep graphics
  • Evaluate alternative web page designs
  • Does layout communicate function?
  • Or, does it matter where items go?
  • Ksdiudhk dkji
  • Mm Mmmmm mmmm

48
Videotaping
  • Usually too much work to go back over it
  • Good for driving home your points
  • Developers who werent there
  • Disagreement on interpretation
  • Management
  • Split screen useful
  • Camera for screen, camera for face

49
Modern Data Capture
  • Morae, from Techsmith
  • Screen Cursor Clicks
  • Face Voice
  • Keystrokes
  • Pages with timestamps
  • Optional concurrent observers time-stamped
    events
  • Start Ron video with Morae

50
Post-test questionnaire
  • Ratings scales are good
  • Then ask them to talk elaborate
  • How hard was it (Extremely easyModerately
    difficultExtremely difficult)
  • Would you use it (Neversometimesalways)
  • Did you understand the part where...
  • Did you like it, find it attractive, etc?
  • Anything missing from it?

51
Reporting the findings
  • Say something positive
  • Make recommendations to improve things
  • For Summative evaluation
  • Common Industry Format for Usability Test Reports
    Version 1.1, October 28, 1999
  • Produced by the Industry USability Reporting
    project www.nist.gov/iusr
  • For Formative evaluation
  • Write for your audience

52
Observing what didnt happen 1
  • Establish expectations for user behavior
  • e.g., this link will be followed for this reason
    by this kind of user
  • Note when it does not happen, explain
  • Look for what didnt take place in debriefing
    afterwards
  • e.g. users in a study didnt look in online
    books TOCs

53
Observing what didnt happen 2
  • Users at antique fair did not use Community
    link
  • They used search facility, found peoples web
    sites
  • Said I didnt know she had a web site -- I know
    her
  • Look for behavior that doesnt make sense

54
Empirical studies of usability testing
  • Usability test of a web site, by 9 teams
  • All teams given same objectives for the same
    interface
  • Each team then conducted a study using their
    organization's standard procedures and
    techniques.
  • Molichs

55
Results of study of usability testing
  • More than 300 problems found in total
  • Most were "reasonable and in accordance with
    generally accepted advice on usable design."
  • There wasn't a single problem that every team
    reported.

56
Tasks as the basis of usability testing
  • 9 teams created 51 different tasks for the same
    UI
  • Each task was well designed valid
  • but little agreement on which tasks were
    critical.
  • If each team used the same best practices, then
    they should have all derived the same tasks from
    the test scenario

57
What to do to improve things
  • Task design is important
  • Agree on them
  • Maybe Goals are more important
  • Better result reporting is needed
  • The teams reports differed widely
  • Ranged from 5 pages to 52 pages
  • Iterations are useful
  • With intervening design changes
  • Culture attitude of continuous testing

58
Empirical ResultsHow much is enough?
  • For applications, rule of thumb 8 is plenty
  • 80 problems w 4-5 users
  • For some web sites 8 is not enough
  • Task purchase CD online (general)
  • Important, new problems with each of 18 users
  • 247 total obstacles-to-purchase, 5 new per user

Spool etc, User Interface Engineering, CHI 2001
-- We conducted usability tests on an e-commerce
web site using a very straightforward task
buying a CD from an online music store. We chose
users who had a history of purchasing music
online. We asked these users to make a shopping
list of CDs they wanted to buy and gave them
money to spend on these items.
59
Is 8 really enough?
  • It depends
  • For iterative UCD, 5-6 users OK
  • To find all usability problems requires a large n
  • Sample of tasks matters need good coverage
  • Can you do repeated trials with same user?
  • Or do they learn all the workarounds?

60
How many users
  • One at a time, at 3M and Microsoft
  • Then made changes
  • Then ran another
  • Achieved good results
  • Some opportunities just make sense
  • Weve had 3 observers in the room
  • With no obvious ill effects
  • Except I needed to moderate

61
Other techniques Surveys
  • QUIST Questionaire for User Interaction
    Satisfaction, Univ of Maryland
  • How long have you worked on this system?
  • How many operating systems have you worked with?
  • Overall reaction to the system
  • Terrible .. Wonderful
  • Frustrating Satisfying
  • Characters, screen layouts, terminology
  • questions about all aspects of a system

62
Logging to study use
  • User actions performance
  • Page visits
  • High-frequency search terms
  • Search results, success, etc.
  • High-frequency error messages
  • Special Events
  • Back button
  • History / Bookmarks / Favorites

63
Project assignment
  • Take a few good questions
  • Describe how you want the data to look
  • Or at least what the comparisons will be
  • To help you make business decisions
  • Dont talk about how you would process it
  • How would you summarize large numbers of episodes
    of use, with people taking different paths, and
    dropping out at different points?

64
Framework for Logging Usability Data (FLUD)
  • Design of a File Format for Logging Website
    Interaction
  • National Institute of Standards and Technology
    Special Publication 500-248
  • Web Metrics Testbed
  • tools and techniques that support rapid, remote,
    and automated testing and evaluation of website
    usability http//zing.ncsl.nist.gov/WebTools/

65
Opentracker.net
  • Website statistics provide insight
  • Make daily decisions based on customer behavior
  • Traffic statistics are a form of direct feedback
  • Generate marketing numbers - not guesswork
  • Learn what customers do adjust content to meet
    their needs
  • Adjust strategies according to what works
  • Identify non-effective strategies and drop them
  • Profit from informed advertising and content
    management decisions

66
Other data-gathering techniques
  • Online/telephone consultants
  • Online suggestion box
  • Interviews, focus panels
  • Eye movements
  • Example from useit.com
  • Using Eye Tracking to Compare Web Page Designs A
    Case Study
  • Agnieszka Bojko. Journal of Usability Studies,
    Issue 3, Volume 1, May 2006, pp. 112-120

67
Field Experiments Observation
  • Productivity experiments on book sorting at a
    library
  • Act as though you had some device
  • Do the devices work ahead of time
  • Ask user to do the newly-defined (partial) task
  • Observational studies
  • Where does the time go
  • Whats it worth to automate a step

68
Summary
  • Get real (representative) users
  • Orient them
  • Testing product not them
  • They can quit. OK to record?
  • Talking aloug
  • Tell them
  • what you want them to accomplish
  • not how to do it
  • Let them do it
  • Train only as much as is realistic
  • Help only as necessary, asking hinting at first
  • Note what they so
Write a Comment
User Comments (0)
About PowerShow.com