Quantitative Evaluation - PowerPoint PPT Presentation

Loading...

PPT – Quantitative Evaluation PowerPoint presentation | free to download - id: 728f7a-NTg3M



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Quantitative Evaluation

Description:

Quantitative Evaluation John Kelleher, IT Sligo – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 41
Provided by: Smo140
Learn more at: http://staffweb.itsligo.ie
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Quantitative Evaluation


1
Quantitative Evaluation
John Kelleher, IT Sligo
2
(No Transcript)
3
Definition
  • Methods
  • Performance/Predictive Modeling
  • GOMS/KLM
  • Fitts Law
  • Controlled Experiments Statistical Analysis
  • Without measurement, success is undefined
  • Formal Usability Study
  • to compare two designs on measurable aspects
  • time required
  • number of errors
  • effectiveness for achieving very specific tasks

4
GOMS Model
  • Card, Moran Newell (1983)
  • Model the knowledge and cognitive processes
    involved when users interact with systems.
  • Goals
  • refer to particular state the user wants to
    achieve
  • Operators
  • refer to the cognitive processes and physical
    actions that need to be performed in order to
    attain those goals
  • Methods
  • are learned procedures for accomplishing the
    goals, consisting of exact sequence of steps
    required
  • Selection Rules
  • Are used to determine which method to select when
    there is more than one available for a given
    stage of a task.

5
GOMS Example of deleting word in MS Word
  • Goal delete a word in a sentence
  • Method for accomplishing goal of deleting a word
    using menu option
  • Step 1 Recall that word to be deleted has to be
    highlightedStep 2 Recall that command is
    cutStep 3 Recall that command cut is in
    edit menuStep 4 Accomplish goal of selecting
    and executing the cut commandStep 5 Return
    with goal accomplished

6
GOMS Example of deleting word in MS Word
  • Method for accomplishing goal of deleting a word
    using delete key
  • Step 1 Recall where to position cursor in
    relation to word to be deletedStep 2 Recall
    which key is delete keyStep 3 Press delete
    key to delete each letterStep 4 Return with
    goal accomplished

7
GOMS Example of deleting word in MS Word
  • Operators to use in above methods
  • Click mouseDrag cursor over textSelect
    menuMove cursor to commandPress keyboard key
  • Selection Rules to decide which method to use
  • 1 Delete text using mouse and selecting from
    menu if large amount of text is to be deleted
  • 2 Delete text using delete key if small number
    of letters is to be deleted

8
Keystroke Level Model
  • Well-known analytic evaluation technique
  • Derived from MHP1
  • Provides detailed quantitative (numerical)
    information of user performance
  • Sufficient for predicting speed of interaction
    with a user interface
  • Basic time prediction components empirically
    derived

1 Model Human Processor by Card, Moran, Newell
(1983)
9
KLM Constants
Operator Name Description Time (Sec)
K Pressing a single key or buttonSkilled typist (55 wpm)Average typist (40 wpm)User unfamiliar with the keyboardPressing shift or control key 0.35 (average)0.220.281.200.08
P Point with a mouse or other device to a target on a displayClicking the mouse or similar device 1.100.20
H Homing hands on the keyboard or other device 0.40
D Draw a line using a mouse Variable depending on the length of line
M Mentally prepare to do something (e.g. make a decision) 1.35
R(t) System response time counted only if it causes the user to wait when carrying out their task t
10
Task in Text Editor
  • Using GOMS
  • Create new file
  • Type in Hello, World.
  • Save document as Hello
  • Print document
  • Exit editor
  • Assume system response is 0, or comparable across
    systems (constant)
  • Average typist (55wpm) (K 0.2)
  • Editor is started, hands in lap

11
All Mouse
12
Shortcuts
13
KLM
  • Applicability
  • User interface w/ limited number of features
  • Repetitive task execution
  • Really only useful for comparative study among
    alternatives albeit sensitive to minor changes
  • Project Ernestine
  • Caveats
  • assumes expert behaviour no errors tolerated
  • user already knows the sequence of operations
    that he or she is going to perform
  • time estimates best followed-up by empirical
    studies
  • ambiguity regarding M operator
  • assumes serial processing

14
Fitts Law
  • Predicts time taken to reach a target using a
    pointing device
  • T k log2(D/S 0.5), k 100 msec.where T
    time to move the hand to a target D distance
    between hand and target S size of target
  • Highlights corners of screen as good targets

15
Performance measures
  • Time easy to measure and suitable for
    statistical analysis. E.g. learning time, task
    completion time.
  • Errors shows where problem exist within a
    system.
  • Suggests the cause of a difficulty.
  • Patterns of system use study the patterns of use
    in different sections. Preference and avoidance
    of sections in a system.
  • Amount of work done in a given time.

16
Other measures
  • Subjective impression measures
  • Attitude measures Use questionnaires or
    interviews
  • Rated aesthetics
  • Rated ease of learning
  • Stated decision to purchase
  • Composite measures
  • Weighted averages of the above
  • E.g. efficiency throughput / number of errors

17
Controlled experiments
  • Designed to test predictions arising from an
    explicit hypothesis that arises out of an
    underlying theory
  • Allows comparison of systems, fine-tuning of
    details ...
  • Strives for
  • lucid and testable hypothesis
  • quantitative measurement
  • measure of confidence in results obtained
    (statistics)
  • replicability of experiment
  • control of variables and conditions
  • removal of experimenter bias

18
Ben Shneiderman (Univ. Maryland US)
  • Experiments have
  • Two Parents
  • a practical problem
  • a theoretical foundation
  • Three Children
  • Help in resolving the practical problems
  • refinements to the theory
  • advice to future experimenters who work on the
    same problem

19
Designing Experiments
  • Formulating the hypotheses
  • Developing predictions from the hypotheses
  • Choosing a means to test the predictions
  • Identifying all the variables that might affect
    the results of the experiment
  • Deciding which are the independent variables,
    dependent variables and which variables need to
    be controlled by some means

20
Usability Laboratory
21
Usability Laboratory
22
Designing Experiments (contd.)
  • Designing the experimental task and method
  • Subject selection
  • Deciding the experimental design, data collection
    method and controlling confounding variables
  • Deciding on the appropriate statistical or other
    analysis
  • Carrying out a pilot study

23
The Experimental Method
  • a) Begin with a lucid, testable hypothesis
  • Example 1
  • there is no difference in the number of
    cavities in children and teenagers using crest
    and no-teeth toothpaste

24
The Experimental Method
  • Example 2
  • there is no difference in user performance
    (time and error rate) when selecting a single
    item from a pop-up or a pull down menu,
    regardless of the subjects previous expertise in
    using a mouse or using the different menu types

25
The Experimental Method
  • b) Explicitly state the independent variables
    that are to be altered
  • independent variable
  • the things you manipulate independent of how a
    subject behaves
  • determines a modification to the conditions the
    subjects undergo
  • may arise from subjects being classified into
    different groups
  • In toothpaste experiment
  • toothpaste type uses Crest or No-teeth
    toothpaste
  • age lt 11 years or gt 11 years
  • In menu experiment
  • menu type pop-up or pull-down
  • menu length 3, 6, 9, 12, 15
  • subject type (expert or novice)

26
The Experimental Method
  • c) Carefully choose the dependent variables that
    will be measured
  • Dependent variables
  • Measures to demonstrate the effects of the
    independent variables
  • Properties
  • Readily observable
  • Stable and reliable so that they do not vary
    under constant experimental conditions
  • Sensitive to the effects of the independent
    variables
  • Readily related to some scale of measurement

27
Dependent variables
  • Some commonly used dependent variables
  • Number of errors made
  • Time taken to complete a given task
  • Time taken to recover from an error
  • In menu experiment
  • time to select an item
  • selection errors made
  • In toothpaste experiment
  • number of cavities
  • frequency of brushing

28
What is an experiment?
  • Three criteria
  • The experimenter must systematically manipulate
    one or more independent variables in the domain
    under investigation
  • The manipulation must be made under controlled
    conditions, such that all variables which could
    affect the outcome of the experiment are
    controlled
  • see confounding variables, next.
  • The experimenter must measure some un-manipulated
    feature that changes, or is assumed to change, as
    a function of the manipulated independent variable

29
Confounding variables
  • Variables that are not independent variables but
    are permitted to vary along in the experiment
  • The logic of experiments is to hold
    variables-not-of-interest constant among
    conditions, systematically manipulate independent
    variables, and observe the effects of the
    manipulation on the dependent variables.

30
Sources of variation
  • Variations in the task performed
  • The effect of the treatment (i.e. the user
    interface improvements that we made)
  • Individual differences between experimental
    subjects (e.g. IQ)
  • Different stimuli for each task
  • Distractions during the trial (sneezing, dropping
    things)
  • Motivation of the subject
  • Accidental hints or intervention by the
    experimenter
  • Other random factors.

31
Examples of Confounding
  • Order effects
  • Tasks done early in testing are slower and more
    prone to error.
  • Tasks done late in testing may be affected by
    user fatigue.
  • Carry-over effects
  • A difference occurs if one condition follows
    another. E.g. Learning text editor commands.
  • Experience factors
  • People in one condition have more/less relevant
    experience than in others.
  • Experimenter/subject bias
  • The experimenter systematically treats some
    subjects different from others, or when subjects
    have different motivation levels.
  • Other uncontrolled variables
  • Time of day, system load.

32
Confounding Prevention
  • Randomization
  • Negates the order effect.
  • Random assignment to conditions is used to ensure
    that any effect due to unknown differences among
    users or conditions is random.
  • Counterbalancing
  • Order and carry-over effect.
  • Test half of the users in condition 1 first, and
    the other half in condition II first. Different
    permutations of condition order can be used.

33
Allocation of participants
  • Judiciously select and assign subjects to groups
    to control variability
  • a) Between-Groups Experiment
  • Two groups of test users, same tasks for both
    groups.
  • Randomly assign users to two equally-sized
    groups.
  • Group A uses only system A, group B only system
    B.
  • b) Within-Groups Experiment
  • One group of test users
  • Each user performs equivalent tasks on both
    systems.
  • Randomly assign users to two equally-sized pools.
  • Pool A uses system A first, pool B system B
    first.
  • c) Matched-pairs

34
Example Designs
Between Groups Between Groups
System A System B
John Dave
James May
Mary Ann
Stuart Phil
Within Groups Within Groups
Participant Sequence
Elizabeth A,B
Michael B,A
Steven A,B
Richard B,A
  • Is more powerful statistically (can compare the
    same person across different conditions, thus
    isolating effects of individual differences)
  • Requires fewer participants than between-groups
  • Learning effects
  • Fatigue effects
  • Requires more participants
  • No transfer of learning effects
  • Less arduous on participants
  • large individual variation in user skills

35
Experimental Details
  • Order of tasks
  • choose one simple order (simple -gt complex)
  • unless doing within groups experiment
  • Training
  • depends on how real system will be used
  • What if someone doesnt finish
  • assign very large time large of errors
  • Pilot study
  • helps you fix problems with the study
  • do 2, first with colleagues, then with real users

36
Sample Size
  • Depends on desired confidence level and
    confidence interval.
  • Confidence level of 95 often used for research,
    80 ok for practical development.
  • Rule of thumb 16-20 test users.

37
Analysing the numbers
  • Example trying to get task time lt30 min.
  • test gives 20, 15, 35, 80, 10, 20
  • mean (average) 30
  • looks good!
  • wrong answer, not certain of anything
  • always chart results
  • Factors contributing to our uncertainty
  • small number of test users (n 6)
  • results are very variable (standard deviation
    32)
  • std. dev. measures dispersal from the mean

38
Experimental Evaluation
?
Advantages
Disadvantages
  • Powerful method (depending on the effects
    investigated)
  • Quantitative data for statistical analysis
  • Can compare different groups of users
  • Reliability and validity good
  • Replicable
  • High resource demands
  • Requires knowledge of experimental method
  • Time spent on experiments can mean evaluation is
    difficult to integrate into design cycle
  • Tasks can be artificial and restricted
  • Cannot always generalise to full system in
    typical working situation
  • all human behaviour variables cannot be
    controlled
  • little recognition of work, time, motivational
    social context
  • subjects ideas, thoughts, beliefs largely ignored

39
Summary
  • Allows comparison of alternative designs
  • Collects objective, quantitative data
    (bottom-line data)
  • Needs significant number of test users (16-20)
  • Usable only later in development process
  • Requires administrator expertise
  • Cannot provide why-information (process data)
  • Formal studies can reveal detailed information
    but take extensive time/effort
  • Applicability
  • system location dangerous or impractical
  • for constrained single user systems
  • to allow controlled manipulation of use

40
Summary (contd.)
  • Suitable...
  • system location dangerous or impractical
  • for constrained single user systems
  • to allow controlled manipulation of use
  • Advantages and Dis-advantages
  • sophisticated expensive equipment
  • uninterrupted environment
  • Hawthorne principle
About PowerShow.com