Automatic Construction of Multifaceted Browsing Interfaces - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Automatic Construction of Multifaceted Browsing Interfaces

Description:

Automatic Construction of Multifaceted Browsing Interfaces – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 25
Provided by: Wis
Category:

less

Transcript and Presenter's Notes

Title: Automatic Construction of Multifaceted Browsing Interfaces


1
Automatic Construction of Multifaceted Browsing
Interfaces
  • Wisam Dakka Columbia University
  • Panagiotis G. Ipeirotis NYU
  • Kenneth R. Wood MSR Cambridge

2
Why Guided Navigation?
  • Typical search for a product name
  • Multifaceted hierarchies are superior than
    single, monolithic hierarchies
  • Allow users to browse across multiple dimensions
  • Expose the contents of the underlying collection
    and can help users more quickly locate items of
    interest

3
Roadmap
  • ?Identify manually the dimensions/facets that can
    be used to browse a collection
  • Type, country, price, grape, diet
  • ?A technique for extracting facets
  • ? Create manually the hierarchies for each
    dimension
  • The countries hierarchy
  • ?An efficient construction algorithm
  • Ranking categories within hierarchies
  • How to show the best categories first?
  • ?Ranking schemes
  • ?Extensive experiments

4
Extracting Important Navigational Facets
Motivation
  • Many collections with metadata organized across
    different facets
  • Corbis royalty-free collection
  • A set of 36,820 annotated images
  • Each image has a title, a free-text description,
    and a set of associated keywords
  • Total of 65,521 keywords, mainly assigned to 14
    out of the 38 facets
  • And many others, like the wine collection we used
    early
  • The task for a given image with its metadata,
    extract a set of proper facets or dimensions
  • Idea classify keywords in the appropriate facets
  • Cat and dog under animal
  • Mountain and fields under topographic feature

How do we extract such facets?
5
Basic Idea for Extracting Important Navigational
Facets
  • Given a collection of objects and associated
    metadata, where each object is assigned to a list
    of facets (dimensions)
  • Train a classifier that given an object and its
    metadata, it projects a list of facets
  • Run the classifier on a new set of objects with
    no assigned facets, to identify the frequently
    used facets
  • Use the discovered facets for the guided
    navigation

6
The Classifier Straightforward Approach
  • Classifying keywords in appropriate facets
  • Cat or dog -gt animal
  • ? Cannot generalize

????
7
The Classifier Expansion Using WordNet
  • Capturing the meanings of other words using
    hypernyms
  • Cat feline, carnivore, mammal, animal, living
    being, object, entity
  • ? can generalize ? cannot disambiguate

Animal ?
Computer Device ?
Fields
Topographic
feline, carnivore, mammal, animal, living being,
object, entity
Hypernyms
Mountain
Topographic
Animal
Dog
Hypernyms
Hypernyms
8
The Classifier Capturing the Context
  • Keywords associated with the same object give
    valuable clues
  • ? can disambiguate

Animal
Computer Device
feline, carnivore, mammal, animal, living being,
object, entity
9
Building the Classifier Text Classification
Problem
  • We can map our problem to be a classical text
    classification problem
  • Each example
  • Represents a keyword in an object
  • Has a list of assigned classes (facets) to the
    keyword
  • Has three vector representations
  • The keyword it self
  • The expansion from WordNet using hypernyms
  • The context - other keywords assigned to same
    object and their hypernyms

10
Efficient Hierarchy Construction
  • Once we have identified the facets, we need to
    navigate within each facet
  • The subsumption algorithm (Croft and Sanderson,
    SIGIR1999)
  • Improved version of the subsumption algorithm
  • For the best values of the different params, the
    algorithm runs 3 time faster than the original
    subsumption algorithm
  • Good integration with relational databases
  • Extensive set of experiments
  • Details in paper

11
Ranking Methods
  • Ranking categories is important difficult
  • Important limited cognitive ability to
    understand presented info
  • Difficult lack of explicit user goals while
    browsing
  • Maximize Coverage maximizes the number of
    objects that are covered by the displayed, top-k
    categories
  • Frequency-based and set-cover schemes
  • ? Structure and user effort to find items
  • Structure considers the structure of the
    underlying hierarchy and the respective effort
    that the user has to put to locate items of
    interest
  • Merit-based

12
Ranking Methods Maximize Coverage
  • Frequency-based Ranking (Baseline)
  • Users see first categories with the greatest
    wealth of information
  • Low ranked categories represent only a small
    fraction of the collection
  • An easy schema to implement
  • Is it optimal?

13
Ranking Methods Maximize Coverage
  • Set-cover Ranking
  • Maximizing the cardinality of the top-k ranked
    categories
  • A well-known NP-complete problem
  • The optimal solution is unnecessary expensive,
    and generates non-monotonic ranking
  • A greedy algorithm for approximating the
    set-cover problem

14
Ranking Methods Structure
  • Merit-based Ranking
  • Ranks higher categories that enable users to
    access their contents with the smallest cost, on
    average
  • We start by defining the cost function T(Ci) the
    time to reach an object starting from node Ci in
    the hierarchy
  • The time for reading the category headings
  • The time spending on correcting mistakes
  • The time for browsing the correct sub-tree
  • Now let us define the Merit score based on T(Ci)

15
Merit-based Ranking
  • The metric is similar to the F1-measure
  • o(C) number of distinct objects classified
    under C
  • Can be computed very efficiently in a bottom-up
    fashion
  • ? Favors categories with low cost and large
    number of objects
  • Using the merit of each category, we can rank
    categories appropriately, putting first
    categories that have good hierarchy structures
    under them and provide access to a large number
    of objects

16
Evaluation Settings
  • Datasets
  • Corbis royalty-free collection
  • XMLTV television programs broadcasted over 261
    channels in NYC
  • DMOZ real web pages from Open Directory
  • Extracting important navigational facets
  • Facet classifier using SVM with linear kernels
    and Ripper
  • Efficient hierarchy construction
  • See paper
  • Ranking categories in a hierarchy
  • Frequency-based, set-cover, merit-based

17
Extracting Important Navigational Facets Results
using SVM and Ripper
  • Baseline
  • 10 (F1) slightly above random classification
  • Adding hypernyms 71 (F1)
  • Adding associated keywords
  • Ripper
  • investigate whether rule-based assignments are
    sufficient
  • High-level WordNet hypernyms
  • 55 (F1), significantly worse than SVM
  • Some classes (facets) work well with simple,
    rule-based assignment of terms to facets
  • Generic Animals (93.3)
  • Action Process Activity (35.9)

SVM with hypernyms and associated keywords
F1 harmonic mean of Precision Recall
18
Ranking Quality of the Generated Hierarchies
  • How do the structural properties affect the
    browsing experience?
  • Coverage the fraction of reachable objects in
    the hierarchy
  • Other properties
  • Average path length shorter paths are preferable
  • Average branching factor users can decide faster
    which category is best with small branching
    factors
  • Can we combine these metrics meaningfully?
  • Cost the time to reach an object in the
    hierarchy

19
Coverage of Ranking Methods
  • Set-cover consistently covers larger fraction of
    the collection
  • As expected, merit-based performs slightly worse
    than the set-cover

20
Cost of Ranking Methods
  • Merit-based consistently perform better than the
    other approaches, decreasing by 10-50 the time
    needed to locate items of interest

21
Ranking Conclusions
  • Merit-based performs very well and offers fast
    access to the contents of the collection.
  • Merit-based rankings are efficient to implement
    on top of relational database systems, while the
    set-cover rankings typically take longer to
    compute

22
Summary
  • Automatically constructing multifaceted
    hierarchies
  • ?A technique for extracting facets
  • ?An efficient construction algorithm
  • Ranking categories in hierarchy
  • ?Frequency-based, set-cover, merit-based schemes
  • ?Extensive experiments
  • Automatic construction of multifaceted interfaces
    is feasible, and generates high-quality
    hierarchies

23
Future Work
  • Exploring different ways of presenting the
    hierarchies to expose the contents of the
    collection in efficient ways
  • Integrating better browsing and searching in
    multifaceted databases
  • Indexing structures to support concurrent
    searching and browsing

24
Thank you for your time
Write a Comment
User Comments (0)
About PowerShow.com