Faceted Metadata in Search Interfaces

About This Presentation
Title:

Faceted Metadata in Search Interfaces

Description:

The thong is part of the hat. The bandana is on the cowboy (not the horse) ... Find pictures by 2 artists in same media. Which Interface Preferable For: ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Faceted Metadata in Search Interfaces


1
Faceted Metadata in Search Interfaces
Marti HearstUC Berkeley School of Information
This Research Supported by NSF IIS-9984741.
2
Focus Search and Navigation of Large Collections
Shopping Sites
Digital Libraries
E-Government Sites
Image Collections
Example the University of California Library
Catalog
3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
What do we want done differently?
  • Organization of results
  • Hints of where to go next
  • Flexible ways to move around
  • How to structure the information?

7
The Problem with Hierarchy
8
The Problem With Hierarchy
9
The Problem with Hierarchy
10
The Problem With Hierarchy
  • Where is Berkeley?
  • College and University gt Colleges and
    Universities gtUnited States gt U gt University of
    California gt Campuses gt Berkeley
  • U.S. States gt California gt Cities gtBerkeley gt
    Education gt College and University gt Public gt UC
    Berkeley

11
Outline
  • Motivation support for browsing big collections
  • Focus on usability for a wide range of lay users
  • Approach flexible application of hierarchical
    faceted metadata
  • Advantages of the approach
  • Results of usability studies
  • Opportunities for AI
  • Creating faceted category hierarchies
  • Assigning items to categories
  • Combine categories to identify tasks
  • A way to focus for personalization research

12
Why Care? These folks do
  • NYTimes archive
  • eBay
  • California Digital Library
  • US Census

13
How to Structure Information for Search and
Browsing?
  • Hierarchy is too rigid
  • KL-One is too complex
  • Hierarchical faceted metadata
  • A useful middle ground

14
What are facets?
  • Sets of categories, each of which describe a
    different aspect of the objects in the
    collection.
  • Each of these can be hierarchical.
  • (Not necessarily mutually exclusive nor
    exhaustive, but often that is a goal.)

15
Facet example Recipes
16
Example of Faceted MetadataCategories for
Biomedical Journal Articles
  • 1. Anatomy A
  • 2. Organisms B
  • 3. Diseases C
  • 4. Chemicals and Drugs D

1. Lung 2. Mouse 3. Cancer 4.
Tamoxifen
17
Goal assign labels from facets
18
Motivation
  • Description 19th c. paint horse saddle and
    hackamore spurs bandana on rider old time
    cowboy hat underchin thong flying off.

19
Motivation
  • Description 19th c. paint horse saddle and
    hackamore spurs bandana on rider old time
    cowboy hat underchin thong flying off.

By using facets, what we are not capturing? The
hat flew off The bandana stayed on. The thong
is part of the hat. The bandana is on the
cowboy (not the horse). The saddle is on the
horse (not the cowboy).
20
Hierarchical Faceted Metadata
  • A simplification of knowledge representation
  • Does not represent relationships directly
  • BUT can be understood well by many people when
    browsing rich collections of information.

21
How to Put In an Interface?Some Challenges
  • Users dont like new search interfaces.
  • How to show lots of information without
    overwhelming or confusing?

22
A Solution (The Flamenco Project)
  • Use proper HCI methods.
  • Organize search results according to the faceted
    metadata so navigation looks similar throughout
  • Easy to see what to go next, were youve been
  • Avoids empty result sets
  • Integrates seamlessly with keyword search

23
The Flamenco Project
  • Incorporating Faceted Hierarchical Metadata into
    Interfaces for Large Collections
  • Key Goals
  • Support integrated browsing and keyword search
  • Provide an experience of browsing the shelves
  • Add power and flexibility without introducing
    confusion or a feeling of clutter
  • Allow users to take the path most natural to them
  • Method
  • User-centered design, including needs assessment
    and many iterations of design and testing

24
Art History Images Collection
25
Questions we are trying to answer
  • How many facets are allowable?
  • Should facets be mixed and matched?
  • How much is too much?
  • Should hierarchies be progressively revealed,
    tabbed, some combination?
  • How should free-text search be integrated?

26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
Information previews
  • Use the metadata to show where to go next
  • More flexible than canned hyperlinks
  • Less complex than full search
  • Help users see and return to previous steps
  • Reduces mental work
  • Recognition over recall
  • Suggests alternatives
  • More clicks are ok iff (J. Spool)
  • The scent of the target does not weaken
  • If users feel they are going towards, rather than
    away, from their target.

42
What is Tricky About This?
  • It is easy to do it poorly
  • It is hard to be not overwhelming
  • Most users prefer simplicity unless complexity
    really makes a difference
  • Small details matter
  • It is hard to make it flow

43
eBay Products
44
(No Transcript)
45
(No Transcript)
46
Search Usability Design Goals
  1. Strive for Consistency
  2. Provide Shortcuts
  3. Offer Informative Feedback
  4. Design for Closure
  5. Provide Simple Error Handling
  6. Permit Easy Reversal of Actions
  7. Support User Control
  8. Reduce Short-term Memory Load

From Shneiderman, Byrd, Croft, Clarifying
Search, DLIB Magazine, Jan 1997. www.dlib.org
47
Usability Studies
  • Usability studies done on 3 collections
  • Recipes 13,000 items
  • Architecture Images 40,000 items
  • Fine Arts Images 35,000 items
  • Conclusions
  • Users like and are successful with the dynamic
    faceted hierarchical metadata, especially for
    browsing tasks
  • Very positive results, in contrast with studies
    on earlier iterations.

48
Post-Test Comparison
Which Interface Preferable For
Faceted
Baseline
Find images of roses Find all works from a given
period Find pictures by 2 artists in same media
Overall Assessment
More useful for your tasks Easiest to use Most
flexible More likely to result in dead
ends Helped you learn more Overall preference
49
Advantages of the Approach
  • Honors many of the most important usability
    design goals
  • User control
  • Provides context for results
  • Reduces short term memory load
  • Allows easy reversal of actions
  • Provides consistent view
  • Allows different people to add content without
    breaking things
  • Can make use of standard technology

50
Advantages of the Approach
  • Systematically integrates search results
  • reflect the structure of the info architecture
  • retain the context of previous interactions
  • Gives users control and flexibility
  • Over order of metadata use
  • Over when to navigate vs. when to search
  • Allows integration with advanced methods
  • Collaborative filtering, predicting users
    preferences

51
Disadvantages
  • Does not model relations explicitly
  • Does it scale to millions of items?
  • Adaptively determine which facets to show for
    different combinations of items
  • Requires faceted metadata!

52
Opportunities for AI
  • Creating hierarchical faceted categories
  • Assigning items to those categories
  • Adaptively adding new facets as data changes
  • A new approach to personalization
  • User-tailored facet combinations
  • Create task-based search interfaces
  • Equate a task with a sequence of facet types

53
Creating Classifications from Data
  • Most approaches are associational
  • AKA clustering, LSA, LDA, etc.
  • This leads to poor results when applied to text
  • To derive facets, need a different angle
  • We have a simple approach based on WordNet

54
Clustering (The Hope)
55
Clustering (The Hope)
56
Clustering (The Reality)
57
Clustering (The Reality)
58
Example Recipes (3500 docs)
59
Blei, Ng, Jordan 03 (Latent Dirichlet
Allocation)
60
Blei, Ng, Jordan 03 (Latent Dirichlet
Allocation)
61
Sanderson Croft 99Term Subsumption
62
Sanderson Croft 99Term Subsumption
63
Stoica Hearst 04WordNet-based
64
Stoica Hearst 04WordNet-based
65
Stoica Hearst 04WordNet-based
66
Stoica Hearst 04WordNet-based
67
Example AP Newswire
P-2 ABSTRACT The Bechtel Group Inc.
offered in 1985 to sell oil to Israel at a
discount of at least 650 million for 10 years if
it promised not to bomb a proposed Iraqi
pipeline, a Foreign Ministry official said
Wednesday. But then-Prime Minister Shimon Peres
said the offer from Bruce Rappaport, a partner in
the San Francisco-based construction and
engineering company, was unimportant,'' the
senior official told The Associated Press. Peres,
now foreign minister, never discussed the offer
with other government ministers, said the
official, who spoke on condition of anonymity.
The comments marked the first time Israel has
acknowledged any offer was made for assurances
not to bomb the planned 1 billion pipeline,
which was to have run near Israel's border
68
Blei, Ng, Jordan 03 (Latent Dirichlet
Allocation)
69
Stoica Hearst 04WordNet-based
70
Stoica Hearst 04WordNet-based
71
Stoica Hearst 04WordNet-based
72
(No Transcript)
73
Stoica Hearst 04WordNet-based
74
Stoica Hearst 04WordNet-based
75
Associational techniques
  • Pros
  • Sometimes terms grouped to get a general concept
  • Airline, airplane, pilots, flight
  • Cons
  • Highly unpredictable
  • Not comprehensive
  • Dollar and yen but no deutchmarks
  • Eastern but no other directions
  • Not uniform in subject matter
  • Mixing currencies with countries with timing
  • Mixing compass directions with airlines

76
Lexical Hierarchy-based
  • Pros
  • Faceted and hierarchical
  • Consistent is-a hierarchies
  • Comprehensiveness more likely
  • Cons
  • Doesnt provide overall themes
  • Airlines, pilots, airplanes
  • Sometimes uses wrong word sense
  • Sometimes the right term/hierarchy is not present
  • Doesnt have dish type nor cuisine for
    recipes
  • Specialized domains wont work

77
Our Approach
  • Leverage the structure of WordNet

Documents
78
Our Approach
  • Leverage the structure of WordNet

79
1. Select Terms
Build tree
Comp. tree
  • Select well distributed
  • terms from collection

Documents
Select terms
Get hypernym paths
WordNet
80
2. Get Hypernym Path
red
blue
81
3. Build Tree
Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
red
blue
82
4. Compress Tree
Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
color
chromatic color
red, redness
blue, blueness
green, greenness
red
blue
green
83
4. Compress Tree (cont.)
Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
color
color
chromatic color
red
blue
green
red
blue
green
84
Disambiguation
  • Ambiguity in
  • Word senses
  • Paths up the hypernym tree

85
How to Select the Right Senses and Paths?
  • First build core tree
  • (1) Create paths for words with only one sense
  • (2) Use Domains
  • Wordnet has 212 Domains
  • medicine, mathematics, biology, chemistry,
    linguistics, soccer, etc.
  • Automatically scan the collection to see which
    domains apply
  • The user selects which of the suggested domains
    to use or may add own
  • Paths for terms that match the selected domains
    are added to the core tree
  • Then add remaining terms to the core tree.

86
Using Domains
dip glosses Sense 1 A depression in an
otherwise level surface Sense 2 The angle that a
magnet needle makes with horizon Sense 3 Tasty
mixture into which bite-size foods are dipped
dip hypernyms Sense 1
Sense 2 Sense 3
solid
shape, form food gt concave
shape gt space
gt ingredient, fixings gt
depression gt angle
gt flavorer
Given domain food, choose
sense 3
87
Opportunities for AI
  • New opportunity Tagging, folksonomies
  • (flickr de.lici.ous)
  • People are created facets in a decentralized
    manner
  • They are assigning multiple facets to items
  • This is done on a massive scale
  • This leads naturally to meaningful associations

88
(No Transcript)
89
http//www.airtightinteractive.com/projects/relate
d_tag_browser/app/
90
(No Transcript)
91
(No Transcript)
92
(No Transcript)
93
(No Transcript)
94
This Doesnt Solve Everything
  • Harder to determine whats related to more
    complex terms
  • Still not good for finding a recipe using potatoes

95
(No Transcript)
96
(No Transcript)
97
(No Transcript)
98
Linking Metadata Into Tasks
  • Old Yahoo restaurant guide combined
  • Region
  • Topic (restaurants)
  • Related Information
  • Other attributes (cuisines)
  • Other topics related in place and time (movies)

99
Yellow geographic region
Green restaurants attributes
Red related in place time
100
Other Possible Combinations
  • Region AE
  • City Restaurant Movies
  • City Weather
  • City Education Schools
  • Restaurants Schools

101
Creating Tasks from HFM
  • Recipes Example
  • Click Ingredient gt Avocado
  • Click Dish gt Salad
  • Implies task of I want to make a Dish type d
    with an Ingredient i that I have lying around
  • Maybe users will prefer to select tasks like
    these over navigating through the metadata.

102
Summary
  • Flexible application of hierarchical faceted
    metadata is a proven approach for navigating
    large information collections.
  • Midway in complexity between simple hierarchies
    and deep knowledge representation.
  • Perhaps HFM is a good stepping stone to deeper
    semantic relations
  • Currently in use on e-commerce sites spreading
    to other domains

103
AI Opportunities
  • Creating hierarchical faceted categories
  • Assigning items to those categories
  • Adaptively adding new facets as data changes
  • A new approach to personalization
  • User-tailored facet combinations
  • Create task-based search interfaces
  • Equate a task with a sequence of facet types
  • Make use of folksonomies data!

104
Acknowledgements
  • Flamenco team
  • Brycen Chun
  • Ame Elliott
  • Jennifer English
  • Kevin Li
  • Rashmi Sinha
  • Emilia Stoica
  • Kirsten Swearingen
  • Ping Yee
  • Thanks also to NSF (IIS-9984741)

105
Thank you!
Marti HearstUC Berkeley School of Information
This Research Supported by NSF IIS-9984741.
Write a Comment
User Comments (0)