Inductive Clustering: A technique for clustering search results Hieu Khac Le Department of Computer - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Inductive Clustering: A technique for clustering search results Hieu Khac Le Department of Computer

Description:

Inductive Clustering is a novel technique to post-process returned search results. ... Summaries for clusters are generated in advance. Sub-queries. Summarizing ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 2
Provided by: sifaka
Category:

less

Transcript and Presenter's Notes

Title: Inductive Clustering: A technique for clustering search results Hieu Khac Le Department of Computer


1
Inductive Clustering A technique for clustering
search resultsHieu Khac LeDepartment of
Computer Science - University of Illinois at
Urbana-Champaign
Traditional approach
Abstract Information overload is a popular
problem today. This problem could be solved
partially with Search Engine a tool helps find
needed information from the whole web. However,
even though some Search Engines work very well,
users still cannot avoid information overload
problem there are so many returned results. Post
processing search result is a step to further
reduce the information overload problem by
organizing search results such that minimizing
the effort for examining them. This project
proposes a novel technique for organizing search
results Inductive Clustering.
  • IC in detail
  • Observation The more specific query we use, the
    less results we get.
  • Key idea From the returned results, generate a
    summary. Results agree with that summary will be
    the first cluster. Generate a summary for the
    remain results results agree with that summary
    will be the second cluster. Do the same process
    until all results are clustered. A large cluster
    could be clustered more in the same way.

Three essential ingredients
Need to define a similarity function
Need to define a threshold Need to
choose the number of clusters in advance
and
or
Those ingredients heavily affect clustering
quality. Unfortunately, there is no guidance to
tune those things, especially with threshold and
number of clusters !!!
  • Dont need a threshold or a given number of
    clusters
  • Intuitively, results tend to agree with clusters
    summary
  • ?Its easy to continue cluster a large cluster
    into smaller clusters

Introduction
Example of an ambiguous query
Cluster titles
Our approach Inductive Clustering (IC)
Experiment Considering first 100 results returned
by Google for 30 queries. Observed clusters shows
that the algorithm work extremely well. Average
Precision with cluster title 90.5 Average
Precision without cluster title 95.6 Average
Precision of clusters title 91.4 Average
execution time 0.27 seconds
Users query
Summaries for clusters are generated in advance
Conclusion Inductive Clustering is a novel
technique to post-process returned search
results. The approach does not require manually
tuned parameters as previous approaches. The
experiments show that IC work extremely well
clusters titles are comprehensive, results in
each cluster agree with the titles, and execution
time is negligible. Results organized with IC are
much more easy to captured by users. We envision
that IC should be implemented as an online
service for broad usage. ?This project was done
under advising of Prof. ChengXiang
Zhai? hieule2_at_uiuc.edu - Date 05/01/2005
Example of an unambiguous query
Clusters with high confidence
Sub-queries
Summarizing
Executing query
Write a Comment
User Comments (0)
About PowerShow.com