Title: LargeScale Cognition: The Psychology of Informavores Dept' of Psychology, Stanford University 14 Mar
1Large-Scale CognitionThe Psychology of
InformavoresDept. of Psychology, Stanford
University14 March 2001
Work supported in part by the Office of Naval
Research
2Aim of this Talk
- Broad view
- Emphasize scale change in information environment
- Sample of psychological investigations
- Sample of applications
3Contributors
- Andrew Faurling
- Ruth Rosenholtz
- Ed Chi
- Jim Pitkow
- Jeff Heer
- Chris Olston
- Peter Pirolli
- Mija Van Der Wege
- Paul Whitmore
- Jenea Boshart
- Julie Morrison
- Pam Schraedly
- Rob Reeder
- Allison Woodruff
4Technology Underpinnings
5The Information Big Bang
AMOUNT OF INFORMATIONPRODUCED IN THE WORLD
0.6 EB/Yr 50/Yr
2 EB/Yr 50/Yr
6Comparisons
- 1999 2009
- (EB) (EB)
- Unique Info 0.6 36
- Store in earths populationmemory in 1 yr 0.1
0.1 - Record all words in alllives 3.6 3.6
7Communications
- One Net accessible everywhere
- 25M Pages/Day
- 2X 10X more/yr for Seated Home User
- Indexed
- Automatable
???
1 Gb
SAN/backpanels
LAN
1 Mb
???
WAN
ISDN
POTS _at_ 17/year
1 Kb
POTS
Source G. Bell
8Humans are informavores
- Informavores Organisms that hunger for
information about the world and themselves
(George Miller, 1983) - Humans seek, gather, share, and consume
information in order to adapt - Now taking this to new scaleLarge-Scale
Cognition - Now that youve got it, how do you every find
anything? - How do you make sense of it?
9Why Interesting to Study
- Important problem
- Information overload
- But potentials for improvement
- High economic costs and benefits
- Web Instrumented
- Can study what was impossible before
- Inherently psychological
- Attention
- Perception
- Memory
- Problem solving
10Pressures of the information environment
A wealth of informationcreates a poverty of
attention and a need to allocate it efficiently
Herbert Simon
Human attention is central,not precision vs.
recall.
11Example 1 Field Study Business Intelligence
- Professional Technology Analyst
- TASK Write monthly newsletter
- METHOD Group scans 600 magazines Copy out
articles Analyst whittles down pile
12Analyst Workflow
Select File
File
(a)
New Article
Marked Mag.
New Mags 2800 pages
ArticleCopies
ArticleCopies
ArticleCopies
Copy
LIBRARY
Transport
Transport
Transport
Transport
ANALYST
(c)
Marked Mag.
Maga-zines
ArticleCopies
Shuffle Dot
Write Article
New Article
Scan Mark
Project Pile
Project Pile
1 Pile
Sort to Pile
File
Cleanup
Discard
13Cascaded Filters Concentrates Information
at Low Cost
50 Mags 2800 page/mo(210 hr/mo)
12
1
Marked Arts.1288 page/mo(97 hr/mo)
Project Pile3000 pages(255 hr)
Writing Pile250 pages(19 hr)
Article
Dot Sort
Write article
Scan Mark
To Library(Copy article)
14Concentration of Information
15Sensemaking Based on Schemata
16Ecological Paradigm
17Ecological Approach
- Human-computer interaction is adaptive to the
extent
Net Knowledge Gained
MAXIMIZE
Costs of Interaction
18Information Foraging Theory
- Take concept of informavores seriously
- Key ideas
- Cost structure of information and economics of
attention - Information scent. Local cues used to explore
and search information spaces - Analysis on two levels
- adaptionist level Rational analysis and
- proximal mechanisms.
- Key implications for machine aids
- Machines that
- (1) Predict Degree-of-Interest over information
field - (2) Use information visualization to aid external
cognition
19Time scales of analysis
Newell
Time scale (s)
Psychological domain
107 106 105
104 103 102
101 100 10-1
- Unit task
- Operations
- Visual attention
- COGNITIVE(Proximal Mechanisms)
20E.G. Housefly Foraging For Food
Adaptionist Level Fly near my
sandwich Proximal Mechanisms
Bell
21Information patches
E.g. Desk Piles, Alta Vista Search List Unlike
animals foraging for food, humans can do patch
construction
22People are information rate maximizers
Benefits/Costs
23When to Switch PatchesRandomly-Ordered Prey
Cumulative gain g(tW)
R
R2
R1
t1
t
t2
tB
tW
Between-patch time
Within-patch time
24Charnovs Marginal Value Theorem
Max gain when slope of within-path gain g
average gain R (tangent in diagram)
Gain
R
g(tW)
Within-patch time
Between-patch time
tB
t
25Between-Patch Enrichment
Gain
R2
R1
g(tW)
Within-patch time
Between-patch time
tB1
t1
tB2
t2
enrichment
Example arrange physical office efficiently
26Within-Patch Enrichment
Example Better filtering of search hits
enrichment
g1(tW)
Within-patch time
Between-patch time
27Summary of Field Study
- Sensitive to cost structure
- Evidence for maximizing Rinfo gain/cost
- Information patches
- Between-patch enrichment (physically)
- Within-patch enrichment (filtering)
28Example 2. Spiral Calendar
29Direct Walk Interactions
Display2
Display3
Display1
Etc
Click,Gesture, Etc
Click,Gesture, Etc
Click,Gesture, Etc
Examples WWW, Mac Finder, HyperCard
30Cost of Knowledge Characteristic Function
Gain in Knowledge
Cost Time
31COKCF Spiral Calendar
107
Spiral Calendar
106
Calendar Manager
105
104
Items accessed within cost
103
102
101
100
0
20
40
60
80
100
120
COST (s)
32COKCF Spiral Calendar
33Summary of Spiral Calendar
- Can measure cost structure
- Cost of Knowledge Characteristic Function
34Example 3. Scatter/Gather
- supports exploration/browsing of very large
full-text collections ( 1,000,000) - creates clusters of content-related documents
- presents users with overviews of cluster contents
- allows user to navigate through clusters and
overviews
35Marti Hearst
36Scatter/Gather task
Display Titles Window
Scatter/Gather Window
Law
Nat. Lang.
World News
Robots
AI
Expert Sys
CS
Planning
Medicine
Bayes. Nets
37information scent
new
cell
Information Need
medical
patient
Text snippet
treatments
dose
procedures
beam
- Spreading activation
- Derived from models of human memory
- Activation reflects likelihood of relevance given
past history and current context - Approximates Bayesian network
38spreading activation networks(for modeling
scent)
Document corpus
Word statistics
Spreading activation network
39interface provides good scent of underlying
document clustering
Perceived by model
Identified by computer
40Summary of Scatter Gather
- Cost structure
- Information Patches (clustered docs)
- Maximizing info gain/cost
- Patch enrichment vs. exploitation
- Spreading activation to model semantic content
- Information scent on direct walk interface
predicts behavior
41Example 4Web Study - Protocol Analysis
- Protocol Structure
- URL
- Observed Actions and Transcript
- Protocol Analysis
42Study
- WWW Task Bank Survey (N 2188)
- 6 Find information tasks, e.g.,
- You are Chair of Comedic events for Louisiana
State University in Baton Rouge. Your computer
has crashed and you have lost several
advertisements for upcoming events. You know that
the Second City tour is coming to your theatre in
the spring, but you do not know the precise date.
Find the date the comedy troupe is playing on
your campus. Also find a photograph of the group
to put on the advertisement. - 12 Stanford University students
- 2 tasks (CITY, ANTZ) analyzed for 4 participants
43Video Data
Web Logger
Question
Internet Explorer
44Instrumentation
45WebLogger Event File
46Protocol
47RobReeder
48(No Transcript)
49Analysis - Information structure
- Web sites
- Portals
- Search engines
- Pages
- Website home page
- Search engine page
- Hitlist page
- Content elements
50Problem space structure
- URL
- Link
- Keyword
- Visual Search
51Web Behavior Graph
Mija Van Der Wege
52Web Behavior Graph
Link Problem Space
URL Problem Space
Keyword Problem Space
Visual Search Problem Space
53Web Behavior Graph
Execution of Operator
Return to Previous State
54Web Behavior Graph
123 Posters
Yahoo
55Web Behavior Graph
No Scent
Low Scent
Medium Scent
High Scent
56Web Behavior Graphs (WBGs)
ANTZ
S1
S6
S7
S10
CITY
S1
S6
S7
S10
5718. People Switch When Information Scent
Gets Low
Patch-leaving policy Leave Web site when
information scent goes below some threshold
58Phase shifts in search regime due to information
scent
q prob of going down wrong path. Small change
in q, big change in nodesexamined. (Working on
now)
Nodes
Examined
Number of Levels (D) (z 10)
59Summary of Web Search
- Cost structure
- Patches of sites, pages, content, links
- Maximize info gain/cost by shifting search
branches and problem spaces - Basic model of web search
- Follow information scent until weak.
- Shift problem spaces if impasse.
- Small shift in scent can be magnified to large
cost of search if near phase boundary. - Direct Walk analysis
- Applies to device controls, software, information
design
60SUMMARY - ADAPTIONIST LEVEL
- Information has a cost structure
- Can articulate with, e.g., COKCF
- People are sensitive the cost structure
- Seek to maximize info gained/cost
- Can predict behavior just by analyzing cost
structure and information scent strength - Information often in patches
- Between patch costs
- Within patch costs
- Exploration vs exploitation trade-offs
- Opportunity costs
- People can shape information environment as well
as search, e.g., by enrichment - Between-patch enrichment
- Within-patch enrichment
61SUMMARY - PROXIMAL MECHANISMS
- People work in multiple problem spaces determined
by system, shift on impasse. - Information ScentBasic method is to follow,
backtrack on impasse or shift problem space. - Dominating effect Phase changes
- Small change in probability of going down wrong
path has large, qualitative effect on search
62APPLICATION 1Scent Web Site Usability
- Improve website design directly by means of
improving information scent
63APPLICATION 2 Web User Flow by Information Scent
(WUFIS) Pete Pirolli Ed Chi
Web site
- Use scent models to simulate user
Web Page content
links
User Information goal
Web user flow simulation
Predicted paths
64User Flow Model
User need (vector of goal concepts)
65(No Transcript)
66APPLICATION 3Enhanced Thumbnails
Improve scent by computed Degree-of-Interest
Visualization
- Text summaries
- Lots of abstract, semantic information
- Image summaries (plain thumbnails)
- Layout, genre information
- Enhanced thumbnails
- Combine features of text thumbnails
67System
Allison Woodruff
- Preprocessor modifies HTML
- e.g., increase size of text, modify color of text
- Renderer creates scaled image of page
- Postprocessor transforms image
- e.g., apply color wash, add text callouts
68System
- Preprocessor modifies HTML
- e.g., increase size of text, modify color of text
- Renderer creates scaled image of page
- Postprocessor transforms image
- e.g., apply color wash, add text callouts
69System
- Preprocessor modifies HTML
- e.g., increase size of text, modify color of text
- Renderer creates scaled image of page
- Postprocessor transforms image
- e.g., apply color wash, add text callouts
70Examples
- Emphasize text that is relevant to query
- Text callouts
- Enlarge text that might be helpful in assessing
page - Enlarge headers
71Results
Note N 12
72APPLICATION 4Degree of Interest Trees Stu Card
David Nation
- Increase info gain/cost by computing DOI Info
Vis. - Automatic visual patch enrichment
- Maintain contextual orientation
73APPLICATION 5 Web Forager
- Create patches of pages (WebBooks)
- Create workspace patch
- Enhance COKCF - large space gestures
N
74SUMMARY - Computer Aids
- Degree-of-Interest Visualization
- Does automatic filtering (within-patch
enrichment) - Support working memory and context
- Enhance Information Scent
- Position on good side of Phase Changes
- Next ACT-R models of search, Models of visual
attention, Information Crystallization
75(No Transcript)
76Most Information Not Paper
Only 0.003 of data (by size) is print. 86
is magnetic.
77Digital Information Taking Over
Paper 2/Yr Film 4/Yr
Non-Digital
InformationProduction
Digital
Optical 70/Yr Magnetic 55/Yr
78Democratization of Data
- Individuals create
- 80 of original paper documents
- 99 of original film documents
- Individuals have
- 55 of disk drive memory
79What Were Trying to DoDual Level of Analysis
- Adaptionist Level (Information Scent)
- Why of action
- Problem user is trying to solve
- Specify without knowing mechanisms
- Proximal Mechanism Level (Problem Space Rules)
- Mechanisms of action
- How each step proceeds
80Physical Layout of Patches Reduces Cost of
Work
81information scent
Tokyo
Cues that facilitate orientation, navigation,
assessment of information value
New York
San Francisco
82spreading activation
Base-level reflects likelihood of occurrence
Strength of link spread reflects likelihood of
cooccurrance
83The Information Big Bang