Title: Tool for Accurately Predicting Website Navigation Problems, Non-Problems, Problem Severity, and Effectiveness of Repairs
1Tool for Accurately Predicting Website Navigation
Problems, Non-Problems, Problem Severity, and
Effectiveness of Repairs
- Marilyn Hughes Blackmon, U. of Colorado
- Muneo Kitajima, AIST, Japan
- Peter Polson, U. of Colorado
2Part One
- Work supported by NSF Grant 01-37759 to M. H.
Blackmon - http//autocww.colorado.edu/brownr/ACWW.php
- http//autocww.colorado.edu/blackmon
- http//autocww.colorado.edu
3Problem that spurred research and development of
tool
- Focus on users building comprehensive knowledge
of a topic - Browse complex websites (cf. search engine)
- Pure forward search
- Learn by exploration
- Automatically predict what is worth repairing?
- Need accurate measure of problem severity
- Need to predict success rate for repairs
- Web designers using tool must be able to do what
unaided designers cannot predict behavior of
users different from themselves objectively
represent user diversity (background knowledge)
4Solution Incrementally extend Cognitive
Walkthrough for the Web (CWW)
- CHI2002 paper tailored Cognitive Walkthrough (CW)
for web navigation - Proved CWW would identify usability problems that
interfere with web navigation - Substituted objective measures of similarity,
familiarity, and elaboration of heading/link
texts using Latent Semantic Analysis (LSA) - CHI2003 paper proved significantly better
performance on CWW-repaired webpages vs.
original, unrepaired pages
5Percent task failure correlated 0.93 with
observed clicks (each task n38)
6Research problem, reformulated What determines
mean clicks?
- Identify repair factors that increase mean
clicks and raise risk of task failure - Hypothetical determinants, based on prior results
and theory underlying CWW research - Unfamiliar correct link, i.e., insufficient
background knowledge to comprehend link - Competing headings their high-scent links
- Competing links under correct heading
- Weak scent correct link under correct heading
7First step Collect enough data for multiple
regression analysis
- Reused 64 tasks from CHI2003 paper and ran
additional experiments to get data on 100 new
tasks, creating 164-task dataset - Developed automatable rules for CWW problem
identification - Built multiple regression model for 164-task
dataset and found 3 independent variables
explaining 57 of the variance
8Multiple regression translates into formula to
predict problem severity
- Multiple regression analysis yielded formula for
predicting mean clicks on links - 2.199 (predicted clicks for non-problem)
- 1.656 if correct link is unfamiliar
- 0.754 times number of competing links nested
under any competing heading - 1.464 if correct link has weak-scent
- zero clicks for competing links under correct
heading - Prediction for non-problem task 2.199
- 2.5 mean clicks distinguishes problem from
non-problem
9Example of task Find article about Hmong
List of 9 categories gt
Social Science gt
Anthropology
Scroll A-Z list to find Hmong
10(No Transcript)
11CWW-identified problems in Find Hmong task
Competing headings
0.30
0.19
0.08
12Predicted mean clicks for Find Hmong task on
original, unrepaired webpage
- 2.199 -- predicted clicks for non-problem
- 1.656 -- if correct link is unfamiliar
- 1.464 -- if correct link has weak-scent
- 3.770 -- (0.754 5, the number of competing
links nested under any competing
heading)_________ - 9.089 -- predicted mean total clicks
13CWW-guided repairs of navigation usability
problems detected by CWW
- Create alternate high-scent paths to target
webpage via all correct and competing headings - IF competing heading(s)
- IF unfamiliar correct link
- IF weak-scent correct link
- Substitute or elaborate link text with familiar,
higher frequency words - IF unfamiliar correct link
14Repair benefits for Find Hmong, a problem
definitely worth repairing
15All 164 tasks Predicted vs. observed mean total
clicks
16Psychological validity measures for 164-task
dataset
- For 46 tasks predicted to have serious problems
(i.e., predicted clicks 5.0) - 100 hit rate, 0 false alarms
- 93 success rate for repairs (statistically
significant difference repaired vs. not) - For all 75 tasks predicted to be problems
- 92 hit rate, 8 false alarms
- 83 success rate for repairs, significant
different repaired vs. unrepaired, plt.0001
17Cross-validation study Replicate the model on
new dataset?
- Ran another large experiment to test whether
multiple regression formula replicated with new
set of tasks - 2 groups
- Each group did 32 new tasks, 64 total tasks
- Used prediction formula to identify problems vs.
non-problems - All tasks have just one correct link
18Multiple regression analysis produced full cross
validation
- Multiple regression of 64-task dataset gave same
3 determinants found for 164-task original
dataset similar coefficients - Hit rate for predicted problems 90, false
alarms 10 - Correct rejection for predicted non-problems
69, 31 misses, but 2/3 of misses had observed
clicks 2.5-3.5, other 1/3 of misses gt3.5 but lt5.0
19Predicted vs. observed clicks for 64 tasks in
cross-validation experiment
20Part Two
21Theory matters CWW is theory-based usability
evaluation method
- CoLiDeS cognitive model (Kitajima, Blackmon,
Polson, 2000, 2005) - Construction-Integration cognitive architecture
(Kintsch, 1998), a comprehensive model of human
cognitive processes - Latent Semantic Analysis (LSA)
22The Key Idea
- Core process underlying Web navigation is skilled
reading comprehension - Comprehension processes build mental
representations of goals and webpage objects
(subregions, hyperlinks, images, and other
targets for action) - Action planning compares goal with potential
targets for action and selects target with
highest activation level
23Consensus Web navigation is equivalent to
following scent trail
- Scent or residue (Furnas, 1997)
- SNIF-ACT based on Information Foraging (Pirolli
Card, 1999) - Bloodhound Project Web User Flow by Information
Scent (WUFIS) gt InfoScent Simulator (Chi, et
al., 2001, 2003) - CWW activation level
24CoLiDeS activation level Scent is MORE than just
similarity
- Adequate background knowledge to comprehend
headings and links? Select semantic space that
best matches user group - Warning bell for low word frequency
- Warning bell for low term vector
- Before computing similarity, simulate human
elaboration of link texts during comprehension,
using LSA Near neighbors, finding terms
simultaneously familiar and similar in meaning - Compute goal-heading and goal-link similarity
with LSA cosines, defining weak scent as a cosine
lt0.10, moderate scent as cosine 0.30
25Conclusions Extending CWW successful for
research and development of tool
- We CAN now predict severity of navigation
usability problems and success rate for repairs
of these problems, so we invest time to repair
only what is worth repairing tasks predicted
5.0 clicks - Web designers using tool CAN do what unaided
designers cannot predict behavior of users
different from themselves objectively represent
user diversity in education level, culture,
language, and field of expertise (background
knowledge)
26Conclusions, continued
- Scales up to large websites
- Reliable (LSA measures vs. human judgments)
- Psychologically valid (228-task dataset, large n
gives stable mean for each task), based on
cognitive model - Theory matters
- Drives experimental design
- High accuracy and psychological validity of tool
- Practitioners and researchers can now put the
tool to use with trust
27(No Transcript)
28Non-problem task Find Fern approaches asymptote
of pure forward search
- One-click minimum path for both problems AND
non-problems - 1.1 mean total clicks on links
- 90 pure forward search (minimum path solution)
- 97 of first clicks were on link under correct
heading - 100 success rate -- everyone finished task in 1
or 2 clicks - 9 seconds mean solution time