L - PowerPoint PPT Presentation

About This Presentation
Title:

L

Description:

... video, or keystroke capture) Plan data analysis techniques in advance Carry ... Plus formel que les tests d utilisabilit Mesures quantitatives de ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 51
Provided by: mjm52
Category:

less

Transcript and Presenter's Notes

Title: L


1
Lévaluation des interfaces utilisateurs
  • N.B. Dans ces diapos,  BGBG  réfère à la 2e
    édition du livre  Human-Computer Interaction 
    de Baecker, Grudin, Buxton et Greenberg (1995)

2
Formative vs Summative Evaluation
  • Formative evaluation (Évaluation formative)
  • Happens throughout the design process
  • Can evaluate scenarios, sketches, models,
    prototypes
  • Summative evaluation (Évaluation
    sommative/récapitulative)
  • Typically happens at the end
  • Assesses system andinterface design
    quality,i.e., how well have we done?

3
Analytic vs Empirical Evaluations (BGBG pp.
228-229)
  • Analytic Evaluations (Évaluations analytiques)
  • Do not involve actual users
  • Focus is on why things happen the way they
    do,and on the components of the system
  • Produce interpretations and suggestions, not
    solid facts
  • Better for formative evaluation than summative
    evaluation
  • Can be used early in design process,before any
    high-fidelity prototype exists
  • Examples heuristic evaluation, walkthrough,
    claims analysis
  • Empirical Evaluations (Évaluations empiriques)
  • Involve actual users
  • Focus is on what actually happens in practice
  • Produce factual measurements and observations
  • Good for summative evaluation,but may not
    clearly point to what changes to make
  • Can produce a lot of data that is laborious to
    analyze
  • Examples experiments, usability testing, field
    studies

4
Empirical EvaluationNaturalistic Observation vs
True Experiments(Example Ray and Ravizza 1985)
Naturalistic observation(watching, recording) True experiments(manipulating, measuring)
Noninterference with phenomena Manipulation, control
Observations of patternsand invariants Measurements of observed patterns
High level, big picture insights Low level, detailed results
Qualitative, descriptive Quantitative
5
Empirical Evaluation User Testing
  • Design and implement scenario or prototype
  • Record user behaviour
  • Typical usage, or critical incidents
  • Keystroke and mouse event recording
  • Thinking aloud protocols
  • Audio or video recording
  • Collect subjective impressions(questionnaire,
    interview)
  • Analyze recordings of user behaviour

6
Typical Steps in User Testing (Gomoll, in Laurel,
85-90)
  • Set up the observation
  • Describe the purpose of the study, and how the
    data collected will be used
  • Tell the user (verbally and on paper) that it's
    OK to quit at any time
  • Ask participant if they are willing to sign form
    to give their permission to begin
  • Pre-questionnaire (name, age, handedness,
    background, education, experience with computers,
    etc.)
  • Talk about and demonstrate the equipment
  • Explain how to think aloud
  • Explain that you will not provide help
  • Describe the task and introduce the system
  • Ask if there are questions before you start then
    begin observation
  • Post-questionnaire and/or interview to solicit
    opinions, impressions, etc.
  • Conclude the observation and debrief participants
  • Transcribe, tabulate the data and results
  • Analyze, interpret the results

7
User Testing (BGBG, Fig. 2.8, p. 85, adapted from
Neilsen, 1992)
  • Practical study design
  • Reflect on the participants backgrounds and how
    they might affect the study
  • Be aware of problems that arise when
    experimenters know the users personally
  • Prepare for the study carefully (avoid last
    minute panic)
  • Select the tasks carefully to be representative
    and to fit the allotted time
  • In general, start with an easier (but not
    frivolous) task
  • Write down features of system not being tested as
    well as those that are!
  • Define the start-up state for the study precisely
  • Define precise rules for when and how users can
    be helped during the study
  • Plan timing and cut-off procedure (if subject
    gets stuck) for each part of study
  • Include provisions for data collection (e.g.,
    audio, video, or keystroke capture)
  • Plan data analysis techniques in advance
  • Carry out an initial pilot study to test your
    protocol
  • Written materials
  • Participant release (permission) form
  • Pre-questionnaire covering prior experience etc.
  • Introduction to the study for users, including
    scenario of use,and description of tasks
  • Checklist for experimenters, and paper for
    note-taking
  • Post-questionnaire or survey

8
User Testing (BGBG, Fig. 2.8, p. 85, adapted from
Neilsen, 1992)
  • Carrying out the study
  • Let users know that complete anonymity will be
    preserved
  • Let them know that they may quit at any time
  • Stress that the system is being tested, not the
    participant
  • Note participant is the more modern term for
    subjet
  • Indicate that you are only interested in their
    thoughts relevant to the system
  • Demonstrate the thinking-aloud method by acting
    it out for a simple task, e.g., figuring out how
    to load a stapler
  • Hand out instructions for each part of the study
    individually, not all at once
  • Maintain a relaxed environment free of
    interruptions
  • Occasionally encourage users to talk if they grow
    silent
  • If users ask questions, try to get them to talk
    (e.g., What do you think is going on? and
    follow predefined rules on when to help or
    interrupt to help.
  • Debrief each user after the experiment

9
Thinking Aloud
  • Attempt to elicit thought processes of
    participant, thereby yielding valuable insights
    (although process is slowed down and may be
    changed)
  • Participant talking while they are doing
  • Problems they are having
  • Solutions they are considering
  • Why they are having trouble
  • Insights that they have
  • Wishes that they have
  • Co-Discovery Pairs of participants conversing
    (Co-Discovery Learning, Kennedy paper in BGBG,
    pp. 182-185)

10
Data Capture and Analysis
  • Keystrokemouse logging
  • Record precise user behaviour
  • Record times to carry out actions
  • Record user errors
  • Observation and note taking by observers,especial
    ly of user problems and critical incidents
  • Best if note taking done by a 2nd observer
  • Audio and video recordings
  • Can't observe and record all behaviour in
    real-time
  • Preserve behaviour for review (even non-verbal
    behaviour)
  • Can produce a lot of data ?

11
Asking Users in Addition to Observing Them
  • Methods
  • (Post-) Questionnaire design
  • Formulating asking questions, analyzing
    answers
  • Hard to avoid bias in the phrasing of questions
  • Therefore requires pre-testing (pilot testing)
  • Surveys (Sondages) (possibly large-scale)
    administration of questionnaires to appropriate
    samples of individuals chosen from a population
  • Administration of questions through interviews

12
Ethical Issues
  • Basic principles
  • Do no harm
  • Voluntary participation
  • Informed consent
  • Right to privacy
  • Use of research protocols and consent forms
  • Explanation of study and purpose
  • Anonymity
  • Ability to withdraw at any time
  • For example, see p. 256 of Rosson Carroll

13
Une taxonomie de plusieurs techniques
dévaluation
14
Taxonomie de McGrath
(discret)
(intrus, dérangeant)
15
Quadrant 1 Field Strategies
  • Study systems in real use on real tasks in real
    work environments, i.e., observe under settings
    with conditions as natural as possible
  • Field studies Study systems in situ, disturbing
    as little as possible, e.g., with ethnography,
    contextual inquiry
  • Field experiments Observe impact of changing
    (ideally) one aspect of a work environment, e.g.,
    in beta testing, studies of technological change
    and new technology introduction

16
Quadrant 2 Experimental Strategies
  • Study systems in a lab under controlled
    conditions, i.e., conditions concocted for
    research purposes
  • Laboratory experiments Carry out controlled
    experiments studying impacts of (ideally) one (or
    two) interface parameter(s)
  • Experimental simulations Create in lab for
    experimental purposes a real system that is used
    by real users on (usually) artificially
    simplified tasks, e.g., user testing, usability
    engineering

17
Quadrant 3 Respondent Strategies
  • Ask informants to tell us something about
    themselves and/or their work or about an
    interface, i.e., where the setting in which
    questions are asked plays no role
  • Judgment studies Ask respondents about an
    interface, e.g., in a demonstration, or with
    usability inspection
  • Sample surveys Ask respondents about themselves
    and/or their work, e.g., with questionnaires,
    surveys, interviews

18
Usability Inspection (a Respodant strategy)
  • Methods
  • Heuristic evaluation Judgments by a panel of
    evaluators (e.g, 3 to 5) of the degree to which
    an interface satisfies a set of usability
    guidelines, followed by discussion and analysis
  • Cognitive walkthroughs
  • Roles
  • Evaluation without users (contrast to usability
    tests, etc.)
  • Elicit expert opinions about the users model,
    functionality, look feel, etc.

19
Usability Inspection (contd)
  • Advantages
  • Structured method of using accumulated wisdom of
    experts
  • Disadvantages
  • Doesnt take advantage of real insights from real
    users
  • Example Heuristic evaluation with 10 usability
    guidelines (Nielsen, BGBG, Fig. 2.7, p. 83)
  • Visibility of system status
  • Match between system and the real world
  • User control and freedom
  • Consistency and standards
  • Error prevention
  • Recognition rather than recall
  • Flexibility and efficiency of ue
  • Aesthetic and minimalist design
  • Help users recognize, diagnose, and recover from
    errors
  • Help and documentation

20
Demonstrations (a Respodant strategy)
  • Demonstrate system to
  • Any random person
  • Management, potential investors, journalists
  • Potential customers
  • Potential users
  • Potential business partners
  • Take detailed notes
  • Elicit reactions to user's model, functionality,
    interface
  • Advantages
  • Get feedback early in prototype or system
    construction
  • You're going to have to give demos anyway why
    not learn from them?
  • Disadvantages
  • System still rough, which introduces noise into
    process

21
Quadrant 4 Theoretical Strategies
  • Ask a theory to tell us something about people's
    work and/or about an interface, i.e., no
    observation of behaviour, experiments, or
    questions are required
  • Formal theory Use a qualitative theory or some
    equations, e.g., behavioural theory, such as
    colour vision or Fitts Law
  • Computer simulation Use and run a computer
    model, e.g., human information processing theory

22
Résumé des techniques dévaluation
  • Stratégies sur le terrain (Field Strategies)
  • Études sur le terrain (Field Studies)
  • Observer processus in situ, en changeant le
    système le moins possible
  • Exemples études ethnographiques, enquêtes
    contextuelles (contextual inquiry) (BGBG pages
    42, 46) (pas nécessaire à
    savoir pour lexamen)
  • Expérimentations sur le terrain (Field
    Experiments)
  • Changer un aspect de lenvironnement et observer
    les effets
  • Stratégies expérimentales (Experimental
    Strategies)
  • Expérimentations de laboratoire (Laboratory
    Experiments / Controlled Experiments)
  • Varier ou manipuler, de façon précise, une ou
    plusieurs variables indépendentes
  • Mesurer de façon précise, une ou plusieurs
    variables dépendentes
  • Essayer de contrôler soigneusement les conditions
  • Simulation expérimentale
  • Créer un système réel, dans un laboratoire, pour
    des utilisateurs réels
  • Exemples
  • Tests dutilisabilité / tests dutilisateurs
  • Emploi souvent un protocole de penser à haute
    voix et/ou une phase de découverte où
    lutilisateur explore linterface emploi souvent
    aussi des questionnaires et/ou entrevues
  • Génie dutilisabilité (Usability engineering)
  • Plus formel que les tests dutilisabilité
  • Mesures quantitatives de performance (métriques)

23
Résume des techniques dévaluation (2)
  • Stratégies de répondants (Respondant Strategies)
  • Études de jugement
  • Exemple inspection dutilisabilité (usability
    inspection) ou expert review
  • Fait par des experts ou concepteurs, sans
    utilisateurs
  • Exemples évaluation heuristique (heuristic
    evaluation)
  • Utilise un ensemble de directives de conceptions
    ou de règles (heuristiques) (exemple les
    heuristique de Nielsen)
  • Exemple cognitive walkthrough
  • Exemple démonstrations
  • Sondages (Surveys)
  • Exemples questionnaires, entrevues
  • Stratégies théoriques (Theoretical Strategies)
  • Théories formelles
  • Involves a model of the user, the system, and
    interaction between the two
  • Exemples loi de Fitts, loi de Hick-Hyman, KLM,
    GOMS, etc.
  • Simulations à lordinateur
  • Simuler un modèle

24
Compromis (Tradeoffs)
A Généralizable (validité externe)B Précis
(validité interne (?))C Réaliste (validité
écologique)
25
Controlled Experiments
26
Controlled Experiments
  • Method
  • Manipulate independent variables, system
    characteristics
  • Control for other variables (hold them constant)
  • Measure dependent variables, user behaviour
  • Roles
  • Understanding factors influencing interface
    quality
  • Determining which conditions or which interface
    is best

27
Controlled Experiments
  • Advantages
  • Strong statements about causality (good internal
    validity)
  • Many experimental designs suitable for varying
    situations
  • Disadvantages
  • Requires time, planning, may be expensive
  • Complex designs (more than 3 or 4 independent
    variables) are often difficult to interpret
  • Often lack external validity and especially
    ecological validity

28
Examples
  • Of 3 interfaces, A, B, C, which enables fastest
    performance at a given task?
  • Does prozac have an effect on performance at
    tying shoe laces?
  • How does frequency of advertisements on
    television affect voting behaivour?
  • Can casting a spell on a pair of dice affect what
    numbers appear on them?

29
Elements of an Experiment
  • Population
  • Set of all possible subjects / observations
  • Sample
  • Subset of the population chosen for study a set
    of subjects / observations
  • Subjects
  • People/users under study. The more politically
    correct term within HCI is participants.
  • Observations / Dependent variable(s)
  • Individual data points that are
    measured/collected/recorded
  • E.g. time to complete a task, errors, etc.
  • Condition / Treatment / Independent variables(s)
  • Something done to the samples that distinguishes
    them(e.g. giving a drug vs placebo, or using
    interface A vs B)
  • Goal of experiment is often to determine whether
    the conditions have an effect on observations,
    and what the effect is

30
Tasks to Design and Run an Experiment
  • Design
  • Choose independent variables
  • Choose dependent variables
  • Develop hypothesis
  • Choose design paradigm (plan expérimental croisé
    ou emboîté)
  • Choose control procedures
  • Choose a sample size
  • Pilot experiment
  • Often more exploratory, varying a greater number
    of variables to get a feel for where the
    effect(s) might be
  • Run experiment
  • Focuses in on the suspected effect tries to
    gather lots of data under key or optimal
    conditions to result in a strong conclusion
  • Analyze data
  • Using statistical tests such as ANOVA
  • Interpret results

31
The Problem Effectiveness of New Method of
Source Code Presentation
  • Source code appearance makes inadequate use of
    capabilities of digital typography
  • Potential to make code more readable, more
    comprehensible with new and enhanced
    presentation format
  • See book by Baecker and Marcus, Human Factors and
    Typography for More Readable Programs,
    Addison-Wesley, 1990
  • On following slides, bullet points that refer to
    an experimental study of our new presentation
    format indicated by

32
Conventional Presentation
33
New Presentation
34
Independent Variables
  • The variable manipulated by the experimenter
  • Also known as factor or treatment
  • Experiment may involve one or many independent
    variables
  • Each independent variable
  • Has 2 or more levels (i.e. values)
  • May be metric (continuous, like the length of a
    menu) or categorical (discrete, like mouse vs.
    trackball, or a Likert scale)
  • In our example just one independent variable,
    with two levels new typesetting format or
    traditional presentation format

35
Dependent Variables
  • Definition
  • Variable measured by experimenter
  • Variable which may depend on the independent
    variables
  • Relationship is not necessarily causal e.g. may
    only be correlated
  • Examples
  • Accuracy, or number of errors
  • Number of subtasks completed in a given time
    period
  • Time to complete each task
  • In our example, ability to comprehend program
    as measured by of questions answered in given
    time

36
Hypotheses
  • Statement, to be tested, of relationship between
    independent and dependent variables
  • The null hypothesis is that the independent
    variables have no effect on the dependent
    variables
  • Hypothesis in our example reading
    comprehension as defined above is improved by new
    method of source code presentation

37
Experimental Design Paradigms
  • Between subjects or within subjects
    manipulation(entre participants vs à travers
    tous les participants)
  • Example designs with one independent variable
  • Between subjects (randomized group) design
    (emboîté)
  • One independent variable with 2 or more levels
  • Subjects randomly assigned to groups
  • Each subject tested under only 1 condition
  • Within subject (repeated measures) design
    (croisé)
  • One independent variable with 2 or more levels
  • Each subject tested under all conditions
  • Order of conditions randomized or counterbalanced
    (why?)
  • In our example, within subjects chosen with two
    conditions, i.e., two sample programs

38
Control Procedures
  • Goal is to eliminate confound hypothesis, i.e.,
    that there are alternative explanation(s) for the
    observed effect(s)
  • To do this Make sure there are no systematic
    differences between conditions other than the
    independent variable
  • In our example, ensure that two sample
    programs are identical in length, complexity,
    difficulty

39
What To Control
  • Subject characteristics
  • Gender, handedness, etc.
  • Ability
  • Experience
  • Task variables
  • Instructions
  • Materials used
  • Environmental variables
  • Setting
  • Noise, light, etc.
  • Order effects
  • Practice
  • Fatigue

40
How to Control
  • Hold constant
  • Use males only, or students from same class
    only
  • Novices only
  • Randomize
  • Subjects to groups
  • Counterbalance
  • Half (chosen randomly) get new presentation
    format first

41
Sample Size Selection
  • More subjects --gt more confidence in results.
    i.e., greater statistical significance
  • But this can be very expensive
  • Many methods to reduce the required number of
    subjects
  • Most HCI experiments 4 to 25 subjects per group
  • In our example, 44 subjects chosen from an 3rd
    year programming course

42
Designing and Running the Experiment and
Collecting the Data
  • Run pilot studies
  • Check experimental design
  • Test and improve
  • Task definition
  • Experimental materials (often the most difficult)
  • Instructions
  • Practice tasks
  • Develop experimenter skills
  • Identify and deal with special problems
  • Run actual experiment
  • Record data
  • Observe behaviour

43
The Presentation Format Experiment
  • Within-subjects design, 44 subjects from 3rd year
    programming course
  • Two similar short C programs, roughly 200 lines
    of code, 4 to 5 pages
  • 40 minutes to skim first program and attempt to
    answer 18 questions, half in familiar format and
    half in new format
  • Then each group given other program in other
    format

44
Data Analysis and Hypothesis Testing
  • Describe data
  • Descriptive statistics (means, medians, standard
    deviations)
  • Graphs and tables
  • Perform statistical analysis of results
  • Are results due to chance? (That is, with what
    probability)
  • In our example, mean percentage of correct
    answers with new format 44, with conventional
    format 35
  • Analysis of variance showed that effect of
    presentation format in increasing program
    readability was significant, F(1,42)18.25,
    plt0.0001.

45
ANOVA
  • Analysis of Variance
  • A statistical test that compares the
    distributions of multiple samples, and determines
    the probability that differences in the
    distributions are due to chance
  • In other words, it determines the probability
    that the null hypothesis is correct
  • If probability is below 0.05 (i.e. 5 ), then we
    reject the null hypothesis, and we say that we
    have a (statistically) significant result
  • Why 0.05 ? Dangers of using this value ?

46
Techniques for Making Experiment more Powerful
(i.e. able to detect effects)
  • Reduce noise (i.e. reduce variance)
  • Increase sample size
  • Control for confounding variables
  • E.g. psychologists often use in-bred rats for
    experiments !
  • Increase the magnitude of the effect
  • E.g. give a larger dosage of the drug

47
Une petite différence entre les moyennes des
échantillons. Est-ce significative, ou
simplement dû au hasard ?
Une plus grande différence entre les moyennes des
échantillons. Est-ce significative, ou
simplement dû au hasard ?
48
et la différence plus grande ici est
significative.
Avec une variance plus petite (que sur le diapo
précedent), on est plus sûr que la très petite
différence ici est dû au hasard
49
Avec une taille déchantillon plus large (que sur
les diapos précedents), on est plus sûr que la
très petite différence ici est dû au hasard
et la différence plus grande ici est
significative.
50
Uses of Controlled Experiments within HCI
  • Evaluate or compare existing systems/features/inte
    rfaces
  • Discover and test useful scientific principles
  • Examples ?
  • Establish benchmarks/standards/guidelines
  • Examples ?
Write a Comment
User Comments (0)
About PowerShow.com