PPT – L PowerPoint presentation | free to download

About This Presentation

Title:

L

Description:

... video, or keystroke capture) Plan data analysis techniques in advance Carry ... Plus formel que les tests d utilisabilit Mesures quantitatives de ... – PowerPoint PPT presentation

Number of Views:81

Avg rating:3.0/5.0

Slides: 51

Provided by: mjm52

Category:

more less

Transcript and Presenter's Notes

Title: L

1
Lévaluation des interfaces utilisateurs

N.B. Dans ces diapos, BGBG réfère à la 2e
édition du livre Human-Computer Interaction
de Baecker, Grudin, Buxton et Greenberg (1995)

2
Formative vs Summative Evaluation

Formative evaluation (Évaluation formative)
Happens throughout the design process
Can evaluate scenarios, sketches, models,
prototypes
Summative evaluation (Évaluation
sommative/récapitulative)
Typically happens at the end
Assesses system andinterface design
quality,i.e., how well have we done?

3
Analytic vs Empirical Evaluations (BGBG pp.
228-229)

Analytic Evaluations (Évaluations analytiques)
Do not involve actual users
Focus is on why things happen the way they
do,and on the components of the system
Produce interpretations and suggestions, not
solid facts
Better for formative evaluation than summative
evaluation
Can be used early in design process,before any
high-fidelity prototype exists
Examples heuristic evaluation, walkthrough,
claims analysis
Empirical Evaluations (Évaluations empiriques)
Involve actual users
Focus is on what actually happens in practice
Produce factual measurements and observations
Good for summative evaluation,but may not
clearly point to what changes to make
Can produce a lot of data that is laborious to
analyze
Examples experiments, usability testing, field
studies

4
Empirical EvaluationNaturalistic Observation vs
True Experiments(Example Ray and Ravizza 1985)
Naturalistic observation(watching, recording) True experiments(manipulating, measuring)
Noninterference with phenomena Manipulation, control
Observations of patternsand invariants Measurements of observed patterns
High level, big picture insights Low level, detailed results
Qualitative, descriptive Quantitative
5
Empirical Evaluation User Testing

Design and implement scenario or prototype
Record user behaviour
Typical usage, or critical incidents
Keystroke and mouse event recording
Thinking aloud protocols
Audio or video recording
Collect subjective impressions(questionnaire,
interview)
Analyze recordings of user behaviour

6
Typical Steps in User Testing (Gomoll, in Laurel,
85-90)

Set up the observation
Describe the purpose of the study, and how the
data collected will be used
Tell the user (verbally and on paper) that it's
OK to quit at any time
Ask participant if they are willing to sign form
to give their permission to begin
Pre-questionnaire (name, age, handedness,
background, education, experience with computers,
etc.)
Talk about and demonstrate the equipment
Explain how to think aloud
Explain that you will not provide help
Describe the task and introduce the system
Ask if there are questions before you start then
begin observation
Post-questionnaire and/or interview to solicit
opinions, impressions, etc.
Conclude the observation and debrief participants
Transcribe, tabulate the data and results
Analyze, interpret the results

7
User Testing (BGBG, Fig. 2.8, p. 85, adapted from
Neilsen, 1992)

Practical study design
Reflect on the participants backgrounds and how
they might affect the study
Be aware of problems that arise when
experimenters know the users personally
Prepare for the study carefully (avoid last
minute panic)
Select the tasks carefully to be representative
and to fit the allotted time
In general, start with an easier (but not
frivolous) task
Write down features of system not being tested as
well as those that are!
Define the start-up state for the study precisely
Define precise rules for when and how users can
be helped during the study
Plan timing and cut-off procedure (if subject
gets stuck) for each part of study
Include provisions for data collection (e.g.,
audio, video, or keystroke capture)
Plan data analysis techniques in advance
Carry out an initial pilot study to test your
protocol
Written materials
Participant release (permission) form
Pre-questionnaire covering prior experience etc.
Introduction to the study for users, including
scenario of use,and description of tasks
Checklist for experimenters, and paper for
note-taking
Post-questionnaire or survey

8
User Testing (BGBG, Fig. 2.8, p. 85, adapted from
Neilsen, 1992)

Carrying out the study
Let users know that complete anonymity will be
preserved
Let them know that they may quit at any time
Stress that the system is being tested, not the
participant
Note participant is the more modern term for
subjet
Indicate that you are only interested in their
thoughts relevant to the system
Demonstrate the thinking-aloud method by acting
it out for a simple task, e.g., figuring out how
to load a stapler
Hand out instructions for each part of the study
individually, not all at once
Maintain a relaxed environment free of
interruptions
Occasionally encourage users to talk if they grow
silent
If users ask questions, try to get them to talk
(e.g., What do you think is going on? and
follow predefined rules on when to help or
interrupt to help.
Debrief each user after the experiment

9
Thinking Aloud

Attempt to elicit thought processes of
participant, thereby yielding valuable insights
(although process is slowed down and may be
changed)
Participant talking while they are doing
Problems they are having
Solutions they are considering
Why they are having trouble
Insights that they have
Wishes that they have
Co-Discovery Pairs of participants conversing
(Co-Discovery Learning, Kennedy paper in BGBG,
pp. 182-185)

10
Data Capture and Analysis

Keystrokemouse logging
Record precise user behaviour
Record times to carry out actions
Record user errors
Observation and note taking by observers,especial
ly of user problems and critical incidents
Best if note taking done by a 2nd observer
Audio and video recordings
Can't observe and record all behaviour in
real-time
Preserve behaviour for review (even non-verbal
behaviour)
Can produce a lot of data ?

11
Asking Users in Addition to Observing Them

Methods
(Post-) Questionnaire design
Formulating asking questions, analyzing
answers
Hard to avoid bias in the phrasing of questions
Therefore requires pre-testing (pilot testing)
Surveys (Sondages) (possibly large-scale)
administration of questionnaires to appropriate
samples of individuals chosen from a population
Administration of questions through interviews

12
Ethical Issues

Basic principles
Do no harm
Voluntary participation
Informed consent
Right to privacy
Use of research protocols and consent forms
Explanation of study and purpose
Anonymity
Ability to withdraw at any time
For example, see p. 256 of Rosson Carroll

13
Une taxonomie de plusieurs techniques
dévaluation
14
Taxonomie de McGrath
(discret)
(intrus, dérangeant)
15
Quadrant 1 Field Strategies

Study systems in real use on real tasks in real
work environments, i.e., observe under settings
with conditions as natural as possible
Field studies Study systems in situ, disturbing
as little as possible, e.g., with ethnography,
contextual inquiry
Field experiments Observe impact of changing
(ideally) one aspect of a work environment, e.g.,
in beta testing, studies of technological change
and new technology introduction

16
Quadrant 2 Experimental Strategies

Study systems in a lab under controlled
conditions, i.e., conditions concocted for
research purposes
Laboratory experiments Carry out controlled
experiments studying impacts of (ideally) one (or
two) interface parameter(s)
Experimental simulations Create in lab for
experimental purposes a real system that is used
by real users on (usually) artificially
simplified tasks, e.g., user testing, usability
engineering

17
Quadrant 3 Respondent Strategies

Ask informants to tell us something about
themselves and/or their work or about an
interface, i.e., where the setting in which
questions are asked plays no role
Judgment studies Ask respondents about an
interface, e.g., in a demonstration, or with
usability inspection
Sample surveys Ask respondents about themselves
and/or their work, e.g., with questionnaires,
surveys, interviews

18
Usability Inspection (a Respodant strategy)

Methods
Heuristic evaluation Judgments by a panel of
evaluators (e.g, 3 to 5) of the degree to which
an interface satisfies a set of usability
guidelines, followed by discussion and analysis
Cognitive walkthroughs
Roles
Evaluation without users (contrast to usability
tests, etc.)
Elicit expert opinions about the users model,
functionality, look feel, etc.

19
Usability Inspection (contd)

Advantages
Structured method of using accumulated wisdom of
experts
Disadvantages
Doesnt take advantage of real insights from real
users
Example Heuristic evaluation with 10 usability
guidelines (Nielsen, BGBG, Fig. 2.7, p. 83)
Visibility of system status
Match between system and the real world
User control and freedom
Consistency and standards
Error prevention
Recognition rather than recall
Flexibility and efficiency of ue
Aesthetic and minimalist design
Help users recognize, diagnose, and recover from
errors
Help and documentation

20
Demonstrations (a Respodant strategy)

Demonstrate system to
Any random person
Management, potential investors, journalists
Potential customers
Potential users
Potential business partners
Take detailed notes
Elicit reactions to user's model, functionality,
interface
Advantages
Get feedback early in prototype or system
construction
You're going to have to give demos anyway why
not learn from them?
Disadvantages
System still rough, which introduces noise into
process

21
Quadrant 4 Theoretical Strategies

Ask a theory to tell us something about people's
work and/or about an interface, i.e., no
observation of behaviour, experiments, or
questions are required
Formal theory Use a qualitative theory or some
equations, e.g., behavioural theory, such as
colour vision or Fitts Law
Computer simulation Use and run a computer
model, e.g., human information processing theory

22
Résumé des techniques dévaluation

Stratégies sur le terrain (Field Strategies)
Études sur le terrain (Field Studies)
Observer processus in situ, en changeant le
système le moins possible
Exemples études ethnographiques, enquêtes
contextuelles (contextual inquiry) (BGBG pages
42, 46) (pas nécessaire à
savoir pour lexamen)
Expérimentations sur le terrain (Field
Experiments)
Changer un aspect de lenvironnement et observer
les effets
Stratégies expérimentales (Experimental
Strategies)
Expérimentations de laboratoire (Laboratory
Experiments / Controlled Experiments)
Varier ou manipuler, de façon précise, une ou
plusieurs variables indépendentes
Mesurer de façon précise, une ou plusieurs
variables dépendentes
Essayer de contrôler soigneusement les conditions
Simulation expérimentale
Créer un système réel, dans un laboratoire, pour
des utilisateurs réels
Exemples
Tests dutilisabilité / tests dutilisateurs
Emploi souvent un protocole de penser à haute
voix et/ou une phase de découverte où
lutilisateur explore linterface emploi souvent
aussi des questionnaires et/ou entrevues
Génie dutilisabilité (Usability engineering)
Plus formel que les tests dutilisabilité
Mesures quantitatives de performance (métriques)

23
Résume des techniques dévaluation (2)

Stratégies de répondants (Respondant Strategies)
Études de jugement
Exemple inspection dutilisabilité (usability
inspection) ou expert review
Fait par des experts ou concepteurs, sans
utilisateurs
Exemples évaluation heuristique (heuristic
evaluation)
Utilise un ensemble de directives de conceptions
ou de règles (heuristiques) (exemple les
heuristique de Nielsen)
Exemple cognitive walkthrough
Exemple démonstrations
Sondages (Surveys)
Exemples questionnaires, entrevues
Stratégies théoriques (Theoretical Strategies)
Théories formelles
Involves a model of the user, the system, and
interaction between the two
Exemples loi de Fitts, loi de Hick-Hyman, KLM,
GOMS, etc.
Simulations à lordinateur
Simuler un modèle

24
Compromis (Tradeoffs)
A Généralizable (validité externe)B Précis
(validité interne (?))C Réaliste (validité
écologique)
25
Controlled Experiments
26
Controlled Experiments

Method
Manipulate independent variables, system
characteristics
Control for other variables (hold them constant)
Measure dependent variables, user behaviour
Roles
Understanding factors influencing interface
quality
Determining which conditions or which interface
is best

27
Controlled Experiments

Advantages
Strong statements about causality (good internal
validity)
Many experimental designs suitable for varying
situations
Disadvantages
Requires time, planning, may be expensive
Complex designs (more than 3 or 4 independent
variables) are often difficult to interpret
Often lack external validity and especially
ecological validity

28
Examples

Of 3 interfaces, A, B, C, which enables fastest
performance at a given task?
Does prozac have an effect on performance at
tying shoe laces?
How does frequency of advertisements on
television affect voting behaivour?
Can casting a spell on a pair of dice affect what
numbers appear on them?

29
Elements of an Experiment

Population
Set of all possible subjects / observations
Sample
Subset of the population chosen for study a set
of subjects / observations
Subjects
People/users under study. The more politically
correct term within HCI is participants.
Observations / Dependent variable(s)
Individual data points that are
measured/collected/recorded
E.g. time to complete a task, errors, etc.
Condition / Treatment / Independent variables(s)
Something done to the samples that distinguishes
them(e.g. giving a drug vs placebo, or using
interface A vs B)
Goal of experiment is often to determine whether
the conditions have an effect on observations,
and what the effect is

30
Tasks to Design and Run an Experiment

Design
Choose independent variables
Choose dependent variables
Develop hypothesis
Choose design paradigm (plan expérimental croisé
ou emboîté)
Choose control procedures
Choose a sample size
Pilot experiment
Often more exploratory, varying a greater number
of variables to get a feel for where the
effect(s) might be
Run experiment
Focuses in on the suspected effect tries to
gather lots of data under key or optimal
conditions to result in a strong conclusion
Analyze data
Using statistical tests such as ANOVA
Interpret results

31
The Problem Effectiveness of New Method of
Source Code Presentation

Source code appearance makes inadequate use of
capabilities of digital typography
Potential to make code more readable, more
comprehensible with new and enhanced
presentation format
See book by Baecker and Marcus, Human Factors and
Typography for More Readable Programs,
Addison-Wesley, 1990
On following slides, bullet points that refer to
an experimental study of our new presentation
format indicated by

32
Conventional Presentation
33
New Presentation
34
Independent Variables

The variable manipulated by the experimenter
Also known as factor or treatment
Experiment may involve one or many independent
variables
Each independent variable
Has 2 or more levels (i.e. values)
May be metric (continuous, like the length of a
menu) or categorical (discrete, like mouse vs.
trackball, or a Likert scale)
In our example just one independent variable,
with two levels new typesetting format or
traditional presentation format

35
Dependent Variables

Definition
Variable measured by experimenter
Variable which may depend on the independent
variables
Relationship is not necessarily causal e.g. may
only be correlated
Examples
Accuracy, or number of errors
Number of subtasks completed in a given time
period
Time to complete each task
In our example, ability to comprehend program
as measured by of questions answered in given
time

36
Hypotheses

Statement, to be tested, of relationship between
independent and dependent variables
The null hypothesis is that the independent
variables have no effect on the dependent
variables
Hypothesis in our example reading
comprehension as defined above is improved by new
method of source code presentation

37
Experimental Design Paradigms

Between subjects or within subjects
manipulation(entre participants vs à travers
tous les participants)
Example designs with one independent variable
Between subjects (randomized group) design
(emboîté)
One independent variable with 2 or more levels
Subjects randomly assigned to groups
Each subject tested under only 1 condition
Within subject (repeated measures) design
(croisé)
One independent variable with 2 or more levels
Each subject tested under all conditions
Order of conditions randomized or counterbalanced
(why?)
In our example, within subjects chosen with two
conditions, i.e., two sample programs

38
Control Procedures

Goal is to eliminate confound hypothesis, i.e.,
that there are alternative explanation(s) for the
observed effect(s)
To do this Make sure there are no systematic
differences between conditions other than the
independent variable
In our example, ensure that two sample
programs are identical in length, complexity,
difficulty

39
What To Control

Subject characteristics
Gender, handedness, etc.
Ability
Experience
Task variables
Instructions
Materials used
Environmental variables
Setting
Noise, light, etc.
Order effects
Practice
Fatigue

40
How to Control

Hold constant
Use males only, or students from same class
only
Novices only
Randomize
Subjects to groups
Counterbalance
Half (chosen randomly) get new presentation
format first

41
Sample Size Selection

More subjects --gt more confidence in results.
i.e., greater statistical significance
But this can be very expensive
Many methods to reduce the required number of
subjects
Most HCI experiments 4 to 25 subjects per group
In our example, 44 subjects chosen from an 3rd
year programming course

42
Designing and Running the Experiment and
Collecting the Data

Run pilot studies
Check experimental design
Test and improve
Task definition
Experimental materials (often the most difficult)
Instructions
Practice tasks
Develop experimenter skills
Identify and deal with special problems
Run actual experiment
Record data
Observe behaviour

43
The Presentation Format Experiment

Within-subjects design, 44 subjects from 3rd year
programming course
Two similar short C programs, roughly 200 lines
of code, 4 to 5 pages
40 minutes to skim first program and attempt to
answer 18 questions, half in familiar format and
half in new format
Then each group given other program in other
format

44
Data Analysis and Hypothesis Testing

Describe data
Descriptive statistics (means, medians, standard
deviations)
Graphs and tables
Perform statistical analysis of results
Are results due to chance? (That is, with what
probability)
In our example, mean percentage of correct
answers with new format 44, with conventional
format 35
Analysis of variance showed that effect of
presentation format in increasing program
readability was significant, F(1,42)18.25,
plt0.0001.

45
ANOVA

Analysis of Variance
A statistical test that compares the
distributions of multiple samples, and determines
the probability that differences in the
distributions are due to chance
In other words, it determines the probability
that the null hypothesis is correct
If probability is below 0.05 (i.e. 5 ), then we
reject the null hypothesis, and we say that we
have a (statistically) significant result
Why 0.05 ? Dangers of using this value ?

46
Techniques for Making Experiment more Powerful
(i.e. able to detect effects)

Reduce noise (i.e. reduce variance)
Increase sample size
Control for confounding variables
E.g. psychologists often use in-bred rats for
experiments !
Increase the magnitude of the effect
E.g. give a larger dosage of the drug

47
Une petite différence entre les moyennes des
échantillons. Est-ce significative, ou
simplement dû au hasard ?
Une plus grande différence entre les moyennes des
échantillons. Est-ce significative, ou
simplement dû au hasard ?
48
et la différence plus grande ici est
significative.
Avec une variance plus petite (que sur le diapo
précedent), on est plus sûr que la très petite
différence ici est dû au hasard
49
Avec une taille déchantillon plus large (que sur
les diapos précedents), on est plus sûr que la
très petite différence ici est dû au hasard
et la différence plus grande ici est
significative.
50
Uses of Controlled Experiments within HCI