Query Result Clustering for Objectlevel Search - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Query Result Clustering for Objectlevel Search

Description:

Personalization maximize the satisfaction of a particular user. ... Cameras: Canon Powershot SD850, Nikon D80, Nikon D2Xs. Laptops Lenovo Thinkpad R61, T60, T61 ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 29
Provided by: poste5
Category:

less

Transcript and Presenter's Notes

Title: Query Result Clustering for Objectlevel Search


1
Query Result Clustering for Object-level Search

Seung-won Hwang (POSTECH) Joint work w/ Jongwuk
Lee (POSTECH)Zaiqing Nie, Ji-rong Wen (MSRA)
2
Outline
  • Motivation
  • Observation
  • Preliminaries
  • Algorithm
  • Experiments

3
Motivation (1 / 4)
  • Given a query, search engines retrieve relevant
    results.
  • Personalization maximize the satisfaction of a
    particular user.
  • Diversification minimize the dissatisfaction of
    varying user intents.

Canon 5D
Good!!
Canon 5D
Good??
4
Motivation (2 / 4)
  • Query result organization
  • Provide end-users with a succinct overview of
    relevant results.
  • e.g., a topic hierarchy or topic terms on a map

Canon 5D
Good!!
Canon 5D
Good!!
5
Motivation (3 / 4)
  • Document-level search
  • Documents as an information unit
  • e.g., Microsoft Live Search
  • Object-level search
  • Web objects as an information unit
  • e.g., Microsoft Libra, Product Search
  • More concise results for object queries

6
Motivation (4 / 4)
  • Our goal

Query result clustering for object-level search
Visualize a graph for object-level
summarization. Center a query object
Nodes relevant objects Edges relationships
between objects
Users can easily recognize relevant objects, then
drill-down their interests.
7
Motivation (4 / 4) - Demo
8
Why Challenging Object-level Search
  • Documents
  • A vector of frequencies (homogeneous)
  • Well-agreed similarity
  • Objects
  • A vector of values
  • (heterogeneous)
  • Different similarity

Compare objects with different importance between
features.
Compare TF-IDF vectors between Docs.
???
???
Sensor size, Optical zoom, Resolution, Weight,
9
Observation (1 / 2)
  • Feature-based similarity
  • Depend on data-specific and intent-specific
    characteristics.
  • Need to use a measure to identify both a relevant
    feature set and the corresponding distance.

DSLR cameras Sensor size, Optical zoom
Compact camerasResolution, Weight
10
Observation (2 / 2)
  • Exploit the intuition of subspace clustering to
    identify relevant objects on different subspaces.

AB subspace
BC subspace
11
Preliminaries (1 / 4)
  • Challenging issue on subspace clustering
  • Expensive to enumerate all possible 2d - 1
    subspaces
  • Hard to select a desirable subspace and distance
    among all subspaces

F1 Sensor size, F2, Optical zoom,F3 Weight
???
???
???
???
???
12
Preliminaries (2 / 4)
  • Possible solution
  • Introduce parameters to save the enumeration cost
    of all subspaces.
  • Rmin The minimum distance on a feature
  • dmin The minimum number of subspaces

Rmin 0.5, dmin 2
v
Feature-based similarity matrix
(Darker cells indicate closer pairs.)
13
Preliminaries (2 / 4)
  • Possible solution
  • Introduce parameters to save the enumeration cost
    of all subspaces.
  • Rmin The minimum distance on a feature
  • dmin The minimum number of subspaces

Rmin 0.5, dmin 2
Feature-based similarity matrix
(Darker cells indicate closer pairs.)
14
Preliminaries (3 / 4)
  • Problem
  • Clustering results heavily depends on parameters
    Rmin and dmin.
  • It is hard to find desirable parameter settings.
  • Our solution
  • Exploit co-occurrence as votes reflecting wisdom
    of crowds.

15
Preliminaries (4 / 4)
  • Co-occurrence similarity
  • Pros Presented as ground-truth reflected from
    creators intuition.
  • Cons Include inconsistent meanings for different
    characteristics.
  • Need to use a complementary measure to
    disambiguate different characteristics.

(1, 2) Similar DSLRs
((1, 2), 6) DSLRs and high-end compact cameras
Co-occurrence matrix
16
Preliminaries (4 / 4)
  • Co-occurrence similarity
  • Pros Presented as ground-truth reflected from
    creators intuition.
  • Cons Include inconsistent meanings for different
    characteristics.
  • Need to use a complementary measure to
    disambiguate different characteristics.

Co-occurrence matrix
17
Algorithm (1 / 4)
  • Co-occurrence similarity
  • Provide ground-truth for the relationships
    between objects.
  • Do not distinguish different relationships.
  • Feature-based similarity
  • Disambiguate inconsistent information for
    different relationships.
  • Cluster quality heavily depends on parameters.

18
Algorithm (2 / 4)
  • Associate co-occurrence with feature-based
    similarity
  • Use co-occurrence similarity to determine the
    order of merging clusters.
  • Provide a less sensitive property for specific
    parameters.
  • Use feature-based similarity to disambiguate
    relationships with different characteristics.
  • Only merge objects with consistent relationships.

19
Algorithm (3 / 4)
Co-occurrence matrix
Rmin 0.5, dmin 2
Feature-based similarity matrix
20
Algorithm (3 / 4)
Co-occurrence matrix
Clustering results
Rmin 0.5, dmin 2
Feature-based similarity matrix
21
Algorithm (4 / 4)
  • Parameter setting
  • Abstracted as a multivariate interpolation
    problem of (Rmin, dmin).
  • Basically, use a linear loosening.
  • Unnecessary computation
  • Improved loosening tuning
  • Conservative loosening
  • Use single-linkage clustering.
  • Estimate dmin.
  • Aggressive loosening
  • Use the distribution on pair-wise feature-based
    distances.
  • Estimate Rmin and dmin as median values.

22
Experiments (1 / 6)
  • Real-life user study
  • Conduct 32 people (MSRA interns and POSTECH
    students)
  • Cameras Canon Powershot SD850, Nikon D80, Nikon
    D2Xs
  • Laptops Lenovo Thinkpad R61, T60, T61

HAC only use co-occurrence. HARP only use
feature-based similarity.
23
Experiments (2 / 6)
  • Synthetic datasets
  • Parameter settings
  • Quality metrics
  • CE (Clustering Error) An ideal value is
    minimized as 0.
  • F1-value, FF1-value An ideal value is maximized
    as 1.

24
Experiments (3 / 6)
  • Varying average feature size

Ideal
Ideal
HydraAdaptive Use aggressive loosening with
Hydra.
25
Experiments (4 / 6)
  • Varying cluster size

26
Experiments (5 / 6)
  • Varying sparsity

Note When sp 0, co-occurrence exists for all
possible pairs.
27
Experiments (6 / 6)
  • Efficiency for cardinality and dimensionality

28
Q A
Thank you!!
28
Write a Comment
User Comments (0)
About PowerShow.com