Marti Hearst - PowerPoint PPT Presentation

About This Presentation
Title:

Marti Hearst

Description:

Why Text is Tricky to Visualize ... 'How much is that doggy in the window? ... 'doggy' implies childlike, plaintive, probably cannot do the purchasing on their own ' ... – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 76
Provided by: anno
Category:
Tags: doggy | hearst | marti

less

Transcript and Presenter's Notes

Title: Marti Hearst


1
Visualization in Text Analysis Problems
VAC Consortium MeetingStanford, May 24, 2006
  • Marti Hearst
  • School of Information, UC Berkeley

2
Outline
  • Some Visualization Design Principles
  • Illustrated with a new example
  • Why Text is Tricky to Visualize
  • How to do good visualization design with text
    while meeting analysts needs?
  • Focus on Flexibility with Reproducibility
  • Examples from 4 different domains

3
What Makes for a Good Visualization?
  • Visually illuminates important aspects of the
    underlying data and domain.
  • Supports the users tasks (better than without
    the visualization).
  • Adheres to good design principles.

4
Example from Software EngineeringMarat
Boshernitsan, UC Berkeley PhD Dissertation 2006
  • Problem need to make complex changes throughout
    code.
  • Example convert from one API to another.

5
A Typical Solution
  • Either requires programmers to understand and
    manipulate abstract syntax trees
  • Or requires learning another programming language
    (or both)!

6
First Attempt
7
Second Attempt
8
A Better Solution
  • Build on how programmers think about programming.
  • Operate on the textual representation of code.

9
Users Operate on Familiar Visual Representation
of Code
10
Context-and-Domain Sensitive Visual Cues
11
Lessons from this Example
  • User-centered Design
  • This was the third attempt.
  • First 2 attempts did not accurately reflect how
    users think about the problem.
  • Careful design of labels and interaction cues
  • Very intelligent backend, but user-activated.
  • Visually and interactively reflects how
    programmers think about programming.

12
What Makes for a Good Visualization for
Analysts?
  • Visually illuminates important aspects of the
    underlying data and domain.
  • Supports the users tasks (better than without
    the visualization).
  • Adheres to good design principles.

13
Goals vs. Tasks
  • Analysts Goals
  • Understand current and past situations
  • Predict and anticipate future situations
  • Observations by Pirolli Card 05
  • Different analysts starting with people,
    organizations, tasks, and time
  • predict coup likelihood
  • understand bio-warfare threats
  • understand relations within cartel

14
Goals vs. Tasks
  • Analysts tasks
  • Explore
  • Extract
  • Filter
  • Link
  • Arrange
  • Compare
  • Hypothesize
  • (A combination of Foraging and Sensemaking)
  • Should do the tasks only to support the goals.

15
Design Principles for Analysts
  • Experienced analysts notice what is missing or
    unexpected (Wright et al. 06)
  • Thus consistency and reproducibility are
    important.

16
Design Principles for Analysts
  • Analysts must guard against confirmation bias.
    (Pirolli Card 05)
  • Thus it is important for analysts to
  • Be able to easily arrange and re-arrange,
  • View information flexibly from many angles,
  • While at the same time retaining consistency and
    reproducibility.
  • However its hard to do this with text.

17
Working with Text Text is especially difficult
to visualize
  • Very high dimensionality
  • Tens to hundreds of thousands of features
  • Compositional
  • Can be combined together in innumerable ways
  • Abstract
  • And so difficult to visualize
  • Not pre-attentive
  • Must foveate to read
  • Subtle
  • Small differences matter
  • Unordered

18
Text Meaning is NOT pre-attentive
SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP
YLKCIUQ DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS
NIATREC YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH
RECORDS COLUMNS ECNEICS HSILGNE SDROCER
SNMULOC GOVERNS PRECISE EXAMPLE MERCURY SNREVOG
ESICERP ELPMAXE YRUCREM CERTAIN QUICKLY PUNCHED
METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM GOVERNS
PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE
YRUCREM SCIENCE ENGLISH RECORDS COLUMNS ECNEICS
HSILGNE SDROCER SNMULOC SUBJECT PUNCHED QUICKLY
OXIDIZED TCEJBUS DEHCNUP YLKCIUQ
DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS NIATREC
YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH RECORDS
COLUMNS ECNEICS HSILGNE SDROCER SNMULOC
19
Why Text is Tough
  • Abstract concepts are difficult to visualize
  • Combinations of abstract concepts are even more
    difficult to visualize
  • time
  • shades of meaning
  • social and psychological concepts
  • causal relationships

20
Why Text is Tough
Why Text is Tough
The dog..
21
Why Text is Tough
Why Text is Tough
The dog.
The dog cavorts.
The dog cavorted.
22
Why Text is Tough
Why Text is Tough
The man.
The man walks.
23
Why Text is Tough
Why Text is Tough
The man walks the cavorting dog.
So far, we can sort of show this in pictures.
24
Why Text is Tough
Why Text is Tough
As the man walks the cavorting dog,
thoughts arrive unbidden of the previous spring,
so unlike this one, in which walking was marching
and dogs were baleful sentinels outside unjust
halls.
How do we visualize this?
25
Why Text is Tough
Why Text is Tough
  • Language only hints at meaning
  • Most meaning of text lies within our minds and
    common understanding
  • How much is that doggy in the window?
  • how much social system of barter and trade (not
    the size of the dog)
  • doggy implies childlike, plaintive, probably
    cannot do the purchasing on their own
  • in the window implies behind a store window,
    not really inside a window, requires notion of
    window shopping

26
Why Text is Tough
Why Text is Tough
  • General categories have no standard ordering
    (nominal data)
  • Categorization of documents by single topics
    misses important distinctions
  • Consider an article about
  • NAFTA
  • The effects of NAFTA on truck manufacture
  • The effects of NAFTA on productivity of truck
    manufacture in the neighboring cities of El Paso
    and Juarez

27
Why Text is Tough
  • Other issues about language
  • Ambiguous (many different meanings for the same
    words and phrases)
  • Same meaning implied by different combinations
  • Different combinations imply different meanings

28
Why Text is (Deceptively) Easy
  • Text is easier when you have a lot of it
  • Web search is now usually conjunction
  • Text has a lot of redundancy
  • A very simple algorithm can
  • Pull out important phrases
  • Find meaningfully related words
  • Create a summary from document
  • Group related documents

29
Why Text is Easy
  • Pretty much any simple technique can pull out
    phrases that seem to characterize a document
  • Most frequent words from an IR lecture
  • 109 slide 69 to 37 view
    37 version 37 graphic 37
    first
  • 37 back 36 previous 36 next
    32 of 31 the
  • 30 recall 28 relevant 27
    precision 25 retrieved 25 documents
  • 21 and 18 evaluate 15 a
    13 what 13 vs 13
    how
  • 12 trec 12 is 12
    high 12 for 10 relevance
  • 10 queries 10 on 9
    information 8 x 8 why
  • 8 as 8 answer 7
    search 7 maron 7 document
  • 7 blair 6 top 6
    results 6 measure
  • 6 length 6 in 6
    evaluation 6 curves

30
Why Text is Easy
  • Same text, removing most frequent words in
    language and most frequent in this text
  • 30 recall 28 relevant 27
    precision 25 retrieved 25 documents
  • 18 evaluate 13 vs 12
    trec 12 high 10 relevance
  • 10 queries 9 information 8 x
    8 answer 7 search
  • 7 maron 7 document 7 blair
    6 top 6 results
  • 6 measure 6 length 6
    evaluation 6 curves
  • These words can act as a simple summary of the
    document
  • People are good at inferring (sometimes
    inventing) the commonalities
  • People are bad at realizing what they are not
    seeing

31
Simple Text Analysis can Mislead
  • Most frequent words
  • Biases towards concepts with unique identifiers.

From Spink, Wolfram, Jansen, Saracevic, JASIS 01
32
Major Trends vs. Minor Discoveries
  • With text, its easy to extract and show the
    largest, main trends
  • But often we want the rare but unexpected and
    important event
  • Russian oil company example
  • Schwarzenegger and Enron
  • Cigarettes and kids
  • Person on the periphery who is working stealthily
    to influence things
  • This is really difficult to solve!

33
Design Principles for Analysts
  • Experienced analysts notice what is missing or
    unexpected.
  • Analysts must guard against confirmation bias.
  • Need to be able to easily arrange and re-arrange,
  • View information flexibly from many angles,
  • While at the same time retaining consistency and
    reproducibility.
  • Interfaces should reflect the domain and data.
  • How to achieve this with text collections?
  • Must transform text in understandable ways
  • Must provide multiple, consistent views that
    nevertheless allow for new discovery and insight

34
Why Emphasize Flexibility?
  • Cant view representations of all the text
    content at once.
  • Instead, needs ways to flexibly navigate, group,
    organize, explore
  • See important pieces over time.

35
The Importance of Flexibility
  • Russell, Slaney, Qu, Houston 05
  • The ease of viewing and manipulation in the
    system strongly influenced the kind of analysis
    operations done.

36
Examples of Flexibility on Text Data
  • PaperLens (Conference proceedings)
  • TAMKI (Customer service requests)
  • Faceted Browsing (e-commerce)
  • Flamenco
  • Ebay Express
  • FaThumb
  • TRIST and Sandbox (Analysts)

37
Flexible views
  • Infoviz 2004 contest
  • Visualize 8 years of conference proceedings
  • Tasks
  • Static Overview of 10 years of Infovis
  • Characterize the research areas and their
    evolution
  • The people in InfoVis
  • Which papers/authors are most often referenced?
  • How many papers conducted a user study?
  • PaperLens integrated solution by Lee, Czerwinski,
    Robertson, Bederson
  • Uses graphical elements and brushing and linking
    to flexibly elicudate a collections contents.
  • http//www.cs.umd.edu/hcil/InfovisRepository/conte
    st-2004/index.shtml

38
(No Transcript)
39
(No Transcript)
40
Flexibility in Foraging and Analysis
  • TAKMI, by Nasukawa and Nagano, 01
  • The system integrates
  • Analysis tasks (customer service help)
  • Content analysis
  • Information Visualization

41
Flexibility in AnalysisTAKMI, by Nasukawa and
Nagano, 2001
  • Documents containing windows 98

42
Flexibility in AnalysisTAKMI, by Nasukawa and
Nagano, 2001
  • TAKMI, by Nasukawa and Nagano, 2001
  • Patent documents containing inkjet, organized
    by entity and year

43
Flexibility in Category Navigation
  • Browsing Information Collections using
    (Hierarchical) Faceted Metadata

44
What are facets?
  • Sets of categories, each of which describe a
    different aspect of the objects in the
    collection.
  • Each of these can be hierarchical.
  • (Not necessarily mutually exclusive nor
    exhaustive, but often that is a goal.)

45
Facet example Recipes
46
Nobel Prize Winners Collection
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
New Site eBay Express
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
Is This Visualization?
  • Prior experience and other peoples attempts seem
    to suggest that fewer graphics and more text is
    better.
  • Details of layout, font and color contrast, label
    selection, and interaction make all the
    difference.

63
Earlier Variation on the Idea
  • Cat-a-Cone, 1997

64
Mobile Variation
  • FaThumb Karlson, Robertson, Robbins, Czerwinski,
    Smith 06
  • Well-received, but visualization part not looked
    at.

65
Flexibility in SenseMaking
  • DLITE by Cousins et al. 97
  • Sandbox by Wright et al. 06

66
Flexibility in Sensemaking
TRIST (The Rapid Information Scanning Tool) is
the work space for Information Retrieval and
Information Triage.
TRIST, Jonkers et al 05
User Defined andAutomatic Categorization
Comparative Analysisof Answers and Content
Rapid Scanningwith Context
Launch Queries
Entities
Query History
Dimensions
AnnotatedDocument Browser
Linked Multi-Dimensional Views Speed Scanning
67
Flexibility for Sensemaking Support
  • Sandbox, Wright et al 06

Quick Emphasis of Items of Importance.
DynamicAnalytical Models.
Direct interactionwith Gestures(no dialog, no
controls).
Assertions with Proving/Disproving Gates.
68
Communication-Centric Text
  • Email, conversations, blogs
  • The first thought is usually nodes and links
  • Doesnt have the desired flexibility
  • Some alternatives
  • The Network
  • Multivariate Networks

69
Re-envisioning Networks
  • Viewing peoples shared workplaces, hometowns,
    schools over time.
  • www.theyrule.net

70
Re-envisioning Networks
  • First cut
  • Hastings, Snow, and King 05

71
Re-envisioning Networks
  • Better version
  • Hastings, Snow, and King 05

72
Re-envisioning Networks
  • Wattenberg 06
  • OLAP on directed labeled graphs

73
Network Flexibility
74
Martin Wattenberg, Visual Exploration of
Multivariate Graphs
M
F
Location A
Location B
Location C
Location D
Location E
75
Re-envisioning Networks
  • Idea vary these ideas to apply to email and
    other communication text.

76
SummaryText Viz Design Guidelines
  • An emphasis on flexible views on text data
  • Emphasize brushing and linking using appropriate
    visual cues.
  • Interaction flow should guide the user but also
    be flexible.
  • Information structure should be consistent and
    reproducible.
  • Other guidelines
  • Make text visible.
  • Visual components should reflect the data and
    tasks.

77
Thank you!
  • www.sims.berkeley.edu/hearst
Write a Comment
User Comments (0)
About PowerShow.com