A Study on Organizing Web Search Results - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

A Study on Organizing Web Search Results

Description:

... harry potter, honda, dell, disney, academia sinica,w3c, ibm. General. sports, jokes, resume, maps, news, wallpapers, photos, radio, business, science, movies, bank ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 50
Provided by: Som125
Category:

less

Transcript and Presenter's Notes

Title: A Study on Organizing Web Search Results


1
A Study on Organizing Web Search Results
  • Student Shawn, Ching-Hsiang Tsai
  • Advisor Dr. Lee-Feng Chien
  • WKD Lab
  • Department of Information Management, NTU
  • 2005/06/27

2
Outline
  • Motivation
  • Related Work
  • The Proposed Approach
  • Topic Finding
  • Search Result Organizing by User-defined Topic
    Classes
  • Experiments
  • Conclusions
  • Demo!

3
Motivation
  • Web search results are often lack of well
    organization which require users to pay attention
    on examining the retrieved pages and to identify
    the relevant ones.

4
Search Result Examples
5
Existing Clustering Engines
  • Some clustering engines, e.g., Vivisimo, try to
    organize search results into clusters
  • Problems exist, e.g., comprehension of clustered
    results, clustering complexity, etc.

6
Search Result Snippet
Title
Short description
Snippet
Link
7
Goal
  • To develop a new approach that can
  • Provide a more comprehensive overview on
    important topics of the search result
  • Present the result with the manner the user
    prefers
  • Facilitate users quick browsing and formulate
    more effective searches

8
LiveMotif
  • A system that realizes the proposed approach

9
Outline
  • Motivation
  • Related Work
  • The Proposed Approach
  • Topic Finding and Clustering
  • Search Result Organizing by User-defined Topic
    Classes
  • Experiments
  • Conclusions
  • Demo!

10
Related Work on Search Result Clustering (SRC)
  • Term-based clustering
  • Document clustering
  • Weak in comprehension
  • Scatter/Gather (Hearst, SIGIR96)
  • Term clustering
  • STC (Zamir, WWW99)
  • DisCover (Kummamuru, WWW04)
  • Salient phrases ranking (Zeng, SIGIR04)
  • Link-based clustering
  • Co-citation and Companion
  • Contents-Link (Wang, CIKM02)

11
Previous Research Result
  • Trends
  • Document clustering ? Term clustering
  • Finding topics to form document clusters
  • Why conventional clustering approaches dont
    work?
  • Snippets are short
  • Clustering algorithms may produce some noises
  • Determination on thresholds of group number is
    hard

12
User Preferences
  • Previous search result clustering is lack of
    explanation
  • In this paper, we take users preferences into
    account
  • The proposed approach allows to organize search
    result with the topic classes defined by users

13
Outline
  • Motivation
  • Related Work
  • The Proposed Approach
  • Topic Finding
  • Search Result Organizing by User-defined Topic
    Classes
  • Experiments
  • Conclusions
  • Demo!

14
Problems to Solve
  • Find meaningful topics
  • Organize search result with user-defined topic
    classes

15
The Proposed Approach
  • Phase I Topic Finding
  • Topic extraction
  • Topic selection
  • Topic set formation
  • Phase II Search Result Organizing
  • Classifier training
  • Topic classification

16
Outline
  • Motivation
  • Related Work
  • The Proposed Approach
  • Topic Finding
  • Search Result Organizing by User-defined Topic
    Classes
  • Experiments
  • Conclusions
  • Demo!

17
Topic Finding
  • Topic extraction
  • Source
  • Title and short description of snippet
  • Nouns
  • Chinese CKIP segment system
  • English POS tag, n-grams (nlt3)
  • Topic selection
  • Ranking
  • Remove low-ranked topics
  • Remove redundant topics, merge similar topics

18
Topic Finding (contd)
  • Topic set formation
  • Two criteria
  • Snippet coverage
  • Topic compactness

19
Outline
  • Motivation
  • Related Work
  • The Proposed Approaches
  • Topics Finding
  • Search Result Organizing by User-defined Topic
    Classes
  • Experiments
  • Conclusions
  • Demo!

20
User-defined Topic Classes
  • User uses topic classes to describe his/her
    preferences
  • Query National Taiwan University
  • User preference 1
  • Professor
  • Department
  • Project
  • User preference 2
  • Student
  • Scholarship
  • Program

21
Example Challenges
User Preference 1
User Preference 2
22
Search Result Organizing
  • Adopt classification to label topic terms by
    user-defined topic classes such that search
    result can be organized by users preferences
  • kNN
  • Adopt vector space model to describe the features
    for both topics and topic classes
  • Term weighting TFIDF
  • Similarity cosine angle

23
Search Result Organizing (contd)
  • Classifier training (Huang, WWW04)
  • Get general concept for each topic class
  • Get Nmax snippets (class objects) for each topic
    class
  • Reformulate the general concepts to specific
    concepts

24
Search Result Organizing (contd)
  • Topic Classification

25
Search Result Organizing (contd)
  • Label topic term with the most relevant topic
    class

Topic Class 1
Topic Class 2
Topic Class 3
sim
r
26
Outline
  • Motivation
  • Related Work
  • The Proposed Approach
  • Topics Finding
  • Search Result Organizing by User-defined Topic
    Classes
  • Experiments
  • Conclusions
  • Demo!

27
Experiments
  • Clustering is really difficult to be evaluated!
  • Exp I Topic Finding
  • 1-1 Performance
  • 1-2 Overlap
  • 1-3 Coverage of top k topics
  • Exp II Search Result Organizing
  • 2-1 Performance of topic classification
  • 2-2 The effect of indirect relevance on the
    performance
  • 2-3 Purity of each topic class
  • 2-4 The effect of indirect relevance on the
    entropy
  • 2-5 The effect of indirect relevance on the
    entropy and performance

28
Exp I Set Up
  • Query
  • Selected from AltaVista query log
  • Ambiguous
  • apple, jaguar, saturn, java
  • Name entity
  • iraq, harry potter, honda, dell, disney, academia
    sinica,w3c, ibm
  • General
  • sports, jokes, resume, maps, news, wallpapers,
    photos, radio, business, science, movies, bank
  • Specific
  • mp3, yoga
  • Topic
  • 200 search result snippets per each query
    (Google)
  • Average 108 topics (76140)

29
Exp I Set Up (contd)
  • Manually label standard answers
  • 3 persons
  • Very relevant 2 points
  • Relevant 1 point
  • Other 0 point
  • Total points ? 2 ? RELEVANT

Person 1 Person 2 Person 3 Total
Topic A 1 1 0 2 RELEVANT
Topic B 2 0 0 2 RELEVANT
Topic B 0 0 1 1
30
Exp I Set Up (contd)
  • Topic selection methods
  • TFIDF
  • VTFIDF
  • LCA

31
Exp1-1 Performance
32
Exp 1-2 Overlap
33
Exp 1-3 Coverage
34
Exp2 Set Up
  • Query
  • China, United States, Japan, Germany, India,
    Singapore, Malaysia, Taipei, California, Beijing
    and Kyoto
  • Topic classes
  • government
  • travel
  • business
  • sports
  • culture
  • school

35
Exp 2 Set Up (contd)
  • Class-Topic pair
  • Modified precision and recall

36
Exp 2-1 Performance of topic classification
37
Exp 2-1 Performance of topic classification (
contd)
38
Exp 2-2 The effect of indirect relevance on
the performance
Totally direct
Totally indirect
39
Exp 2-2 The effect of indirect relevance on
the performance (contd)
40
Exp 2-3 Purity of each topic class
41
Exp 2-4 The effect of indirect relevance on
the entropy
42
Exp 2-5 The effect of the indrect relevance on
the purity and performance
43
User Study
Question Evaluated Criteria
1a Topics help understanding the overview of the search result.
1b Topics help quick browsing interested search result snippets.
1c Good representativeness and distinctness among topics.
2a User-defined topic classes help browsing search result snippets of different topic classes.
2b User-defined topic classes help understand unknown topics.
2c User-defined topic classes help understand the relationship among topics.
3 Generally speaking, LiveMotif can provide more help than Vivisimo
44
User Study Result
1a 1b 1c 2a 2b 2c 3
Agree (106) 33 31 26 33 32 24 26
Neutral (5) 0 3 6 2 3 9 4
Disagree (40) 2 2 1 0 0 2 4
45
Outline
  • Motivation
  • Related Work
  • The Proposed Approach
  • Topics Finding
  • Search Result Organizing by User-defined Topic
    Classes
  • Experiments
  • Conclusions
  • Demo!

46
Conclusions
  • Summary
  • We provide a new approach for search result
    organizing
  • Topic finding
  • User-defined topic classes
  • Contributions
  • Comprehensive overview on important topics of the
    search result
  • Organize the result with the manner the user
    prefers
  • Future Work
  • Automatically suggest topic classes
  • More analysis and comparison, e.g. type of topic
  • The impact of user-defined classification on
    different types of query
  • Apply to other non-Web document retrieval
    application

47
Outline
  • Motivation
  • Related Work
  • The Proposed Approach
  • Topics Finding
  • Search Result Organizing by User-defined Topic
    Classes
  • Experiments
  • Conclusions
  • Demo!

48
Demo!
  • LiveMotif (http//livemotif.wkdlab.net/)

49
Thank You
Write a Comment
User Comments (0)
About PowerShow.com