Seesaw Personalized Web Search - PowerPoint PPT Presentation

About This Presentation
Title:

Seesaw Personalized Web Search

Description:

Based on standard tf.idf. web search retrieval ir hunt. 1.3. Calculating a Document's Score ... All three agree only for www.microsoft.com. Inter-rater reliability: 56 ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 67
Provided by: JaimeT4
Category:

less

Transcript and Presenter's Notes

Title: Seesaw Personalized Web Search


1
SeesawPersonalized Web Search
  • Jaime Teevan, MIT
  • with Susan T. Dumais
  • and Eric Horvitz, MSR

2
(No Transcript)
3
Personalization Algorithms
  • Query expansion
  • Standard IR

Query
Server
Document
Client
User
4
Personalization Algorithms
  • Query expansion
  • Standard IR

Query
Server
Document
Client
User
v. Result re-ranking
5
Result Re-Ranking
  • Ensures privacy
  • Good evaluation framework
  • Can look at rich user profile
  • Look at light weight user models
  • Collected on server side
  • Sent as query expansion

6
Seesaw Search Engine
Seesaw
Seesaw
dog 1 cat 10 india 2 mit 4 search 93 amherst
12 vegas 1
7
Seesaw Search Engine
query
dog 1 cat 10 india 2 mit 4 search 93 amherst
12 vegas 1
8
Seesaw Search Engine
query
forest hiking walking gorp
dog cat monkey banana food
baby infant child boy girl
csail mit artificial research robot
baby infant child boy girl
web search retrieval ir hunt
dog 1 cat 10 india 2 mit 4 search 93 amherst
12 vegas 1
9
Seesaw Search Engine
query
Search results page
6.0
1.6
0.2
2.7
0.2
1.3
dog 1 cat 10 india 2 mit 4 search 93 amherst
12 vegas 1
web search retrieval ir hunt
1.3
10
Calculating a Documents Score
  • Based on standard tf.idf

web search retrieval ir hunt
1.3
11
Calculating a Documents Score
  • Based on standard tf.idf

(ri0.5)(N-ni-Rri0.5) (ni-ri0.5)(R-ri0.5)
wi log
  • User as relevance feedback
  • Stuff Ive Seen index
  • More is better

0.1 0.5 0.05 0.35 0.3
1.3
12
Finding the Score Efficiently
  • Corpus representation (N, ni)
  • Web statistics
  • Result set
  • Document representation
  • Download document
  • Use result set snippet
  • Efficiency hacks generally OK!

13
Evaluating Personalized Search
  • 15 evaluators
  • Evaluate 50 results for a query
  • Highly relevant
  • Relevant
  • Irrelevant
  • Measure algorithm quality
  • DCG(i)

Gain(i), DCG(i1) Gain(i)/log(i),
if i 1 otherwise
14
Evaluating Personalized Search
  • Query selection
  • Chose from 10 pre-selected queries
  • Previously issued query

Pre-selected
cancer Microsoft traffic
bison frise Red Sox airlines
Las Vegas rice McDonalds
Mary
Joe
Total 137
53 pre-selected (2-9/query)
15
Seesaw Improves Text Retrieval
  • Random
  • Relevance Feedback
  • Seesaw

16
Text Features Not Enough
17
Take Advantage of Web Ranking
18
Further Exploration
  • Explore larger parameter space
  • Learn parameters
  • Based on individual
  • Based on query
  • Based on results
  • Give user control?

19
Making Seesaw Practical
  • Learn most about personalization by deploying a
    system
  • Best algorithm reasonably efficient
  • Merging server and client
  • Query expansion
  • Get more relevant results in the set to be
    re-ranked
  • Design snippets for personalization

20
User Interface Issues
  • Make personalization transparent
  • Give user control over personalization
  • Slider between Web and personalized results
  • Allows for background computation
  • Creates problem with re-finding
  • Results change as user model changes
  • Thesis research ReSearch Engine

21
Thank you!
  • teevan_at_csail.mit.edu

22
END
23
Personalizing Web Search
  • Motivation
  • Algorithms
  • Results
  • Future Work

24
Personalizing Web Search
  • Motivation
  • Algorithms
  • Results
  • Future Work

25
Study of Personal Relevancy
  • 15 participants
  • Microsoft employees
  • Managers, support staff, programmers,
  • Evaluate 50 results for a query
  • Highly relevant
  • Relevant
  • Irrelevant
  • 10 queries per person

26
Study of Personal Relevancy
  • Query selection
  • Chose from 10 pre-selected queries
  • Previously issued query

Pre-selected
cancer Microsoft traffic
bison frise Red Sox airlines
Las Vegas rice McDonalds
Mary
Joe
Total 137
53 pre-selected (2-9/query)
27
Relevant Results Have Low Rank
Highly Relevant
Relevant
Irrelevant
28
Relevant Results Have Low Rank
Highly Relevant
Rater 1
Rater 2
Relevant
Irrelevant
29
Same Results Rated Differently
  • Average inter-rater reliability 56
  • Different from previous research
  • Belkin 94 IRR in TREC
  • Eastman 85 IRR on the Web
  • Asked for personal relevance judgments
  • Some queries more correlated than others

30
Same Query, Different Intent
  • Different meanings
  • Information about the astronomical/astrological
    sign of cancer
  • information about cancer treatments
  • Different intents
  • is there any new tests for cancer?
  • information about cancer treatments

31
Same Intent, Different Evaluation
  • Query Microsoft
  • information about microsoft, the company
  • Things related to the Microsoft corporation
  • Information on Microsoft Corp
  • 31/50 rated as not irrelevant
  • Only 6/31 do more than one agree
  • All three agree only for www.microsoft.com
  • Inter-rater reliability 56

32
Search Engines are for the Masses
Joe
Mary
33
Much Room for Improvement
  • Group ranking
  • Best improves on Web by 38
  • More people ? Less improvement

34
Much Room for Improvement
  • Group ranking
  • Best improves on Web by 38
  • More people ? Less improvement
  • Personal ranking
  • Best improves on Web by 55
  • Remains constant

35
Personalizing Web Search
  • Motivation
  • Algorithms
  • Results
  • Future Work

- Seesaw Search Engine
- See
- Seesaw
36
BM25
with Relevance Feedback
Score S tfi wi
N
ni
R
ri
N ni
wi log
37
BM25
with Relevance Feedback
Score S tfi wi
N
ni
R
ri
(ri0.5)(N-ni-Rri0.5) (ni-ri0.5)(R-ri0.5)
wi log
38
User Model as Relevance Feedback
Score S tfi wi
N
R
N NR ni niri
ri
ni
(ri0.5)(N-ni-Rri0.5) (ni- ri0.5)(R-ri0.5)
(ri0.5)(N-ni-Rri0.5) (ni- ri0.5)(R-ri0.5)
wi log
39
User Model as Relevance Feedback
World
Score S tfi wi
N
User
R
ri
ni
40
User Model as Relevance Feedback
World
Score S tfi wi
N
User
World related to query
R
ri
ni
ni
N
41
User Model as Relevance Feedback
World
Score S tfi wi
N
User
World related to query
R
ri
ni
R
ni
User related to query
N
ri
Query Focused Matching
42
User Model as Relevance Feedback
World Focused Matching
World
Score S tfi wi
N
User
Web related to query
R
ri
ni
R
ni
User related to query
N
ri
Query Focused Matching
43
Parameters
  • Matching
  • User representation
  • World representation
  • Query expansion

44
Parameters
  • Matching
  • User representation
  • World representation
  • Query expansion

Query focused World focused
45
Parameters
  • Matching
  • User representation
  • World representation
  • Query expansion

Query focused World focused
46
User Representation
  • Stuff Ive Seen (SIS) index
  • MSR research project Dumais, et al.
  • Index of everything a users seen
  • Recently indexed documents
  • Web documents in SIS index
  • Query history
  • None

47
Parameters
  • Matching
  • User representation
  • World representation
  • Query expansion

Query focused World focused
All SIS Recent SIS Web SIS Query history None
48
Parameters
  • Matching
  • User representation
  • World representation
  • Query expansion

Query Focused World Focused
All SIS Recent SIS Web SIS Query History None
49
World Representation
  • Document Representation
  • Full text
  • Title and snippet
  • Corpus Representation
  • Web
  • Result set title and snippet
  • Result set full text

50
Parameters
  • Matching
  • User representation
  • World representation
  • Query expansion

Query focused World focused
All SIS Recent SIS Web SIS Query history None
Full text Title and snippet
Web Result set full text Result set title and
snippet
51
Parameters
  • Matching
  • User representation
  • World representation
  • Query expansion

Query focused World focused
All SIS Recent SIS Web SIS Query history None
Full text Title and snippet
Web Result set full text Result set title and
snippet
52
Query Expansion
  • All words in document
  • Query focused

The American Cancer Society is dedicated to
eliminating cancer as a major health problem by
preventing cancer, saving lives, and diminishing
suffering through ...
The American Cancer Society is dedicated to
eliminating cancer as a major health problem by
preventing cancer, saving lives, and diminishing
suffering through ...
53
Parameters
  • Matching
  • User representation
  • World representation
  • Query expansion

Query focused World focused
All SIS Recent SIS Web SIS Query history None
Full text Title and snippet
Web Result set full text Result set title and
snippet
All words Query focused
54
Parameters
  • Matching
  • User representation
  • World representation
  • Query expansion

Query focused World focused
All SIS Recent SIS Web SIS Query history None
Full text Title and snippet
Web Result set full text Result set title and
snippet
All words Query focused
55
Personalizing Web Search
  • Motivation
  • Algorithms
  • Results
  • Future Work

56
Best Parameter Settings
  • Matching
  • User representation
  • World representation
  • Query expansion

Query focused World focused
Query focused World focused
Query focused
All SIS Recent SIS Web SIS Query history None
All SIS
Recent SIS
Web SIS
All SIS Recent SIS Web SIS Query history None
All SIS
Full text Title and snippet
Full text
Title and snippet
Web Result set full text Result set title and
snippet
Result set title and snippet
Web
Result set title and snippet
All words Query focused
All words
Query focused
57
Seesaw Improves Retrieval
  • No user model
  • Random
  • Relevance Feedback
  • Seesaw

58
Text Alone Not Enough
59
Incorporate Non-text Features
60
Summary
  • Rich user model important for search
    personalization
  • Seesaw improves text based retrieval
  • Need other features
  • to improve Web
  • Lots of room
  • for improvement

future
61
Personalizing Web Search
  • Motivation
  • Algorithms
  • Results
  • Future Work
  • Further exploration
  • Making Seesaw practical
  • User interface issues

62
Further Exploration
  • Explore larger parameter space
  • Learn parameters
  • Based on individual
  • Based on query
  • Based on results
  • Give user control?

63
Making Seesaw Practical
  • Learn most about personalization by deploying a
    system
  • Best algorithm reasonably efficient
  • Merging server and client
  • Query expansion
  • Get more relevant results in the set to be
    re-ranked
  • Design snippets for personalization

64
User Interface Issues
  • Make personalization transparent
  • Give user control over personalization
  • Slider between Web and personalized results
  • Allows for background computation
  • Creates problem with re-finding
  • Results change as user model changes
  • Thesis research ReSearch Engine

65
Thank you!
66
Search Engines are for the Masses
  • Best common ranking
  • DCG(i)
  • Sort results by number marked highly relevant,
    then by relevant
  • Measure distance with Kendall-Tau
  • Web ranking more similar to common
  • Individuals ranking distance 0.469
  • Common ranking distance 0.445

Gain(i), if i 1 DCG(i1)
Gain(i)/log(i), otherwise
Write a Comment
User Comments (0)
About PowerShow.com