Automatic Identification of User Goals in Web Search - PowerPoint PPT Presentation

About This Presentation

Title:

Automatic Identification of User Goals in Web Search

Description:

Users have different goals for Web search. Reach the homepage of an ... How 'asymmetric' f(x) is. Kurtosis (x - )4 f(x) dx / 4 How 'peaked' f(x) is ... – PowerPoint PPT presentation

Number of Views:123

Avg rating:3.0/5.0

Slides: 28

Provided by: oakCs

Learn more at: http://oak.cs.ucla.edu

Category:

more less

Transcript and Presenter's Notes

Title: Automatic Identification of User Goals in Web Search

1
Automatic Identification of User Goals in Web
Search

Uichin Lee, Zhenyu Liu, Junghoo ChoComputer
Science Department, UCLAuclee, vicliu,
cho_at_cs.ucla.edu

2
Motivation

Users have different goals for Web search
Reach the homepage of an organization (e.g.,
UCLA)
Learn about a topic (e.g., simulated annealing)
Download online music, etc.
Can we identify the user goal for a Web search
automatically?
Improve and customize search results based on the
identified user goal, for example

3
Two high-level user-goals

Navigational query
Reach a Web site the user already has in mind
(e.g., UCLA Library)
Informational query
Visit multiple sites to learn about a particular
topic (e.g. Simulated Annealing)
Based on Broder02, RoseLevinson04
Navigational and informational are common in both
studies

4
Exploiting identified user goals

Tailored weighting/ranking mechanism
Navigational queries
Emphasize on anchor texts Craswell01, Kang03,
URL path Westerveld01
Informational queries
Emphasize on page content Kang03, IR techniques
(query expansion, relevance feedback, pseudo
relevance feedback, etc.)
Tailored result presentation
Informational queries
Clustered search results Etzioni99, Zeng04,
Kummamuru04
Targeted ads / answers

5
Outline

Are query goals predictable?
Human-subject study
How can we predict user goals automatically?
Anchor-link distribution
User-click distribution
How effective are our features?
Experimental evaluation

6
Are query goals predictable?

Search engines see only a few keywords
No explicit indication of goals by users
Can we predict the user goal simply from the
keywords?
Human subject study
50 most popular Google queries from UCLA CS
28 participants (grad students) from UCLA CS
Ask subjects to indicate the likely goal of each
query if they had issued it
Do most subjects agree on a particular goal?

7
Human subject study results

i(q) the of participants that judge query q
as informational
e.g., i(q) 0.038 forUCLA Library

8
Human subject study results

i(q) the of participants that judge query q
as informational
e.g., i(q) 0.038 forUCLA Library

43.5 software names 30.4 person names
9
Human subject study results

i(q) the of participants that judge query q
as informational
e.g., i(q) 0.038 forUCLA Library
After removing software and person-name queries

10
Human subject study summary

Majority of queries have predictable goals
Interestingly, most ambiguous queries tend to be
on a certain set of topics
Topic-based ambiguity detection may be possible
Treat ambiguous queries differently from others

11
Outline

Are query goals predictable?
Human-subject study
How can we predict user goals automatically?
How effective are our features?
Experimental evaluation

12
How to predict user goal?

UCLA Library vs. Simulated Annealing
Navigational vs. informational
Semantic analysis necessary?

Our idea use information provided implicitly by
Web users
Web-link structure
User-click behavior

13
Web-link structure

Anchor-link distribution to quantify the link
structure

www.ucla.edu/library.html
repositories.cdlib.org/uclalib/
www.library.ucla.edu
14
Web-link structure

Anchor-link distribution to quantify the link
structure

Anchor-link distribution for query UCLA Library
www.ucla.edu/library.html
repositories.cdlib.org/uclalib/
www.library.ucla.edu
15
Anchor-link distribution for sample queries
Simulated Annealing
UCLA Library
Navigational
Informational
16
User-click behavior

Click distribution to quantify past user-click
behavior

Click distribution for the navigational query
UCLA Library
17
User-click behavior (contd)
Simulated Annealing
UCLA Library
Navigational
Informational
18
Capturing the shape of distributions

Possible numeric features for f(x)
Mean ?
Median
Skewness ?(x - ?)3?f(x)?dx / ?3
How asymmetric f(x) is
Kurtosis ?(x - ?)4?f(x)?dx / ?4
How peaked f(x) is
Single linear regression
Median is the most effective measurement for both
anchor-link distribution and click distribution

19
Evaluation of features

Based on 30 queries from the human subject study
Except software and person-name queries
Each query is associated with a distinct user
goal
Anchor-link distribution for each query
Based on 60M pages crawled from the Web
Click distribution for each query
Based on Google-result click behavior from UCLA
CS during April 2004 - September 2004

20
Goal-prediction graph (synthetic)
navigational
informational
?
An effective feature (hypothetically)
21
Prediction graph median of anchor-link dist.

Navigational iff median lt ?1 1.0
Navigational queries the vast majority of links
point to the1 anchor destination
Prediction accuracy 80.0

navigational
informational
?1 1.0
22
Prediction graph combining the two features

Linear combination with equal weights
Navigational queries iff
the median of click dist.
the median of anchor-link dist.
lt ?1 ?2 ( 2.0)
Prediction accuracy 90

navigational
informational
?1?2 2.0
23
Comparison with previous work

Three features in Kang and Kim 03
Anchor usage rate
Query term distribution
Term-dependence

24
Summary

Two effective features for goal identification
Anchor-link distribution (Web-link structure) and
click distribution (user-click behavior)
Achieved an overall accuracy of 90 on a
benchmark query set
More details in the paper

25
Future work

Evaluate on a larger and less biased query set
Handle queries with insufficient anchor/click
statistics
Learn patterns from queries whose goals are clear
Predict search intentions on a finer granularity
Informational queries can be further classified,
e.g., directed, undirected, advice, list, etc.
Rose04
Analyze the contents of Web pages that users have
clicked/viewed
Linguistic methods

26
Thank you