Automatic Identification of User Goals in Web Search - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Identification of User Goals in Web Search

Description:

Users have different goals for Web search. Reach the homepage of an ... How 'asymmetric' f(x) is. Kurtosis (x - )4 f(x) dx / 4 How 'peaked' f(x) is ... – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 28
Provided by: oakCs
Learn more at: http://oak.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Automatic Identification of User Goals in Web Search


1
Automatic Identification of User Goals in Web
Search
  • Uichin Lee, Zhenyu Liu, Junghoo ChoComputer
    Science Department, UCLAuclee, vicliu,
    cho_at_cs.ucla.edu

2
Motivation
  • Users have different goals for Web search
  • Reach the homepage of an organization (e.g.,
    UCLA)
  • Learn about a topic (e.g., simulated annealing)
  • Download online music, etc.
  • Can we identify the user goal for a Web search
    automatically?
  • Improve and customize search results based on the
    identified user goal, for example

3
Two high-level user-goals
  • Navigational query
  • Reach a Web site the user already has in mind
    (e.g., UCLA Library)
  • Informational query
  • Visit multiple sites to learn about a particular
    topic (e.g. Simulated Annealing)
  • Based on Broder02, RoseLevinson04
  • Navigational and informational are common in both
    studies

4
Exploiting identified user goals
  • Tailored weighting/ranking mechanism
  • Navigational queries
  • Emphasize on anchor texts Craswell01, Kang03,
    URL path Westerveld01
  • Informational queries
  • Emphasize on page content Kang03, IR techniques
    (query expansion, relevance feedback, pseudo
    relevance feedback, etc.)
  • Tailored result presentation
  • Informational queries
  • Clustered search results Etzioni99, Zeng04,
    Kummamuru04
  • Targeted ads / answers

5
Outline
  • Are query goals predictable?
  • Human-subject study
  • How can we predict user goals automatically?
  • Anchor-link distribution
  • User-click distribution
  • How effective are our features?
  • Experimental evaluation

6
Are query goals predictable?
  • Search engines see only a few keywords
  • No explicit indication of goals by users
  • Can we predict the user goal simply from the
    keywords?
  • Human subject study
  • 50 most popular Google queries from UCLA CS
  • 28 participants (grad students) from UCLA CS
  • Ask subjects to indicate the likely goal of each
    query if they had issued it
  • Do most subjects agree on a particular goal?

7
Human subject study results
  • i(q) the of participants that judge query q
    as informational
  • e.g., i(q) 0.038 forUCLA Library

8
Human subject study results
  • i(q) the of participants that judge query q
    as informational
  • e.g., i(q) 0.038 forUCLA Library

43.5 software names 30.4 person names
9
Human subject study results
  • i(q) the of participants that judge query q
    as informational
  • e.g., i(q) 0.038 forUCLA Library
  • After removing software and person-name queries

10
Human subject study summary
  • Majority of queries have predictable goals
  • Interestingly, most ambiguous queries tend to be
    on a certain set of topics
  • Topic-based ambiguity detection may be possible
  • Treat ambiguous queries differently from others

11
Outline
  • Are query goals predictable?
  • Human-subject study
  • How can we predict user goals automatically?
  • How effective are our features?
  • Experimental evaluation

12
How to predict user goal?
  • UCLA Library vs. Simulated Annealing
  • Navigational vs. informational
  • Semantic analysis necessary?
  • Our idea use information provided implicitly by
    Web users
  • Web-link structure
  • User-click behavior

13
Web-link structure
  • Anchor-link distribution to quantify the link
    structure

www.ucla.edu/library.html
repositories.cdlib.org/uclalib/
www.library.ucla.edu
14
Web-link structure
  • Anchor-link distribution to quantify the link
    structure

Anchor-link distribution for query UCLA Library
www.ucla.edu/library.html
repositories.cdlib.org/uclalib/
www.library.ucla.edu
15
Anchor-link distribution for sample queries
Simulated Annealing
UCLA Library
Navigational
Informational
16
User-click behavior
  • Click distribution to quantify past user-click
    behavior

Click distribution for the navigational query
UCLA Library
17
User-click behavior (contd)
Simulated Annealing
UCLA Library
Navigational
Informational
18
Capturing the shape of distributions
  • Possible numeric features for f(x)
  • Mean ?
  • Median
  • Skewness ?(x - ?)3?f(x)?dx / ?3
  • How asymmetric f(x) is
  • Kurtosis ?(x - ?)4?f(x)?dx / ?4
  • How peaked f(x) is
  • Single linear regression
  • Median is the most effective measurement for both
    anchor-link distribution and click distribution

19
Evaluation of features
  • Based on 30 queries from the human subject study
  • Except software and person-name queries
  • Each query is associated with a distinct user
    goal
  • Anchor-link distribution for each query
  • Based on 60M pages crawled from the Web
  • Click distribution for each query
  • Based on Google-result click behavior from UCLA
    CS during April 2004 - September 2004

20
Goal-prediction graph (synthetic)
navigational
informational
?
An effective feature (hypothetically)
21
Prediction graph median of anchor-link dist.
  • Navigational iff median lt ?1 1.0
  • Navigational queries the vast majority of links
    point to the1 anchor destination
  • Prediction accuracy 80.0

navigational
informational
?1 1.0
22
Prediction graph combining the two features
  • Linear combination with equal weights
  • Navigational queries iff
  • the median of click dist.
  • the median of anchor-link dist.
  • lt ?1 ?2 ( 2.0)
  • Prediction accuracy 90

navigational
informational
?1?2 2.0
23
Comparison with previous work
  • Three features in Kang and Kim 03
  • Anchor usage rate
  • Query term distribution
  • Term-dependence

24
Summary
  • Two effective features for goal identification
  • Anchor-link distribution (Web-link structure) and
    click distribution (user-click behavior)
  • Achieved an overall accuracy of 90 on a
    benchmark query set
  • More details in the paper

25
Future work
  • Evaluate on a larger and less biased query set
  • Handle queries with insufficient anchor/click
    statistics
  • Learn patterns from queries whose goals are clear
  • Predict search intentions on a finer granularity
  • Informational queries can be further classified,
    e.g., directed, undirected, advice, list, etc.
    Rose04
  • Analyze the contents of Web pages that users have
    clicked/viewed
  • Linguistic methods

26
Thank you
  • Any questions?

27
Questionnaire design
  • 1st version direct classification by subjects
  • Navigational vs. informational
  • Some confusion
  • Alan Kay home page other pages
  • Have a site in mind? vs plan to visit one
    site?
  • 2nd version
  • Have a site in mind. Intend to visit only that
    site
  • Have a site in mind. But willing to visit others
  • Have no site in mind. Willing to visit anything
    relevant
Write a Comment
User Comments (0)
About PowerShow.com