Ontology Based Personalized Search - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Ontology Based Personalized Search

Description:

different information needs for different users. what's ... Concatenate them - keyword vector. Content of a page: most similar vector! 8. Updating profiles ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 23
Provided by: ale1152
Category:

less

Transcript and Presenter's Notes

Title: Ontology Based Personalized Search


1
Ontology BasedPersonalized Search
  • Alexander Pretschner
  • Susan Gauch
  • The University of Kansas

2
Outline
  • Motivation
  • User profiles
  • creation and maintenance
  • evaluation
  • Application
  • re-ranking (and filtering) search results
  • evaluation
  • Conclusions

3
Motivation
  • Personalization
  • different information needs for different users
  • what's that, information need?
  • very large data bases
  • Re-ranking (and filtering) search results
  • Internet search engines return 50 irrelevant
    pages
  • just one application!

4
Context
  • ProFusion www.profusion.com
  • OBIWAN distributed content based IR
  • Web clustered into regions
  • clustering criteria content, location, company
  • search query brokered to best regions within
    region brokered to most promising sites
  • browsing a region means browsing its sites
    simultaneously
  • www.ittc.ukans.edu/obiwan

5
User Profiles
  • Applications
  • Usenet news filtering
  • recommendation services web browsing, books
  • expertise location
  • rating
  • individual and collaborative
  • Should
  • accurately reflect actual interests
  • require as little feedback as possible
  • be dynamic

6
User profiles Creation
  • Obvious and often used keywords
  • Problems
  • not structured (ambiguity)
  • static
  • have to be explicitly mentioned
  • Our approach
  • watch over a user's shoulder while surfing
  • automatically determine documents content
  • central large ontology (concept hierarchy)

7
Content of Documents
  • Documents asweighted keywordvectors
  • n different words-gt n dimensions
  • weights based on frequency and inverted document
    frequency
  • Browsing hierarchy 10 web pages per node
  • Concatenate them -gt keyword vector
  • Content of a page most similar vector!

8
Updating profiles
  • Static document related
  • content weights of top nodes for surfed document
  • length of page
  • Dynamic time spent
  • Combine them
  • for instanceweight (time/length)
  • changes in interest in the five categories
  • User profile weighted ontology

9
Profile evaluation
  • Accordance with actual user interests
  • 10/20 interest categories describe actual
    interests
  • describe interests pretty well 3.5/5
  • 1/4 doesn't describeinterests at all
  • Convergence
  • stabilization of ofcategories over time?
  • do converge after 320 surfed pages!

10
Profiles Summary
  • Stored as weighted ontologies
  • Profiles represent actual interests quite well
  • Up to 150 top categories (/4.300)
  • Two adjustment functions make profiles converge
  • after 320 pages
  • length of page doesn't really matter

11
Personalizing Search Results
  • Just one application!
  • 50 of top 20 results irrelevant
  • Same search mechanism for 165 million people?
  • Goal
  • identify relevant documents and put them on top
    of the result list
  • filter irrelevant results
  • Difficult problem 10 increase is very good!

12
Re-Ranking
  • Relate
  • search engine's original ranking
  • extents to which top 5 categories describe
    document's content
  • personal interest in each of these top categories
  • More relevant items on top of result list
  • systems ability topresent all relevant items
  • systems ability to present only relevant items

13
Recall and Precision
  • Combination Recall/Precision graphs
  • Example ranked documents 1,,20
  • relevant 2,5,10,14,19
  • recall points 1/5, 2/5, 3/5, 4/5, 5/5
  • precisions 1/2, 2/5, 3/10, 4/14, 5/19

14
Re-Ranking Evaluation
  • Overall performance increase of up to 8
  • at each recall cutoff, 8 more relevant documents
    have been retrieved
  • All relevant docsequally relevant!?
  • doesnt reflect reality
  • Compare orderings!
  • n-dpm yields 3-5 performance increase

15
Conclusions
  • Automatic creation of structured user profiles is
    possible
  • Profiles are reasonably accurate
  • Many applications
  • Evaluation of re-ranking search results
    performance increase of up to 8
  • Filtering possible

16
Future Work
  • Incorporating profile generator into browser
  • Connect system to ProFusion, OBIWAN
  • Personalize structure of ontology
  • Re-train classifier
  • More applications recommendation service
    (proactive agents), browsing, ...
  • Filtering approaches not based on rankings
  • Explicit user feedback?
  • Combination with domain modeling

17
  • NEXT 5 SLIDES ADDITIONAL

18
Filtering
  • Get rid of irrelevant documents!
  • Personalized ranking with threshold
  • e.g., personalized ranking value lt.7 means
    document is irrelevant
  • How many irrelevant documents have correctly been
    classified as irrelevant?
  • How many relevant documents have incorrectly been
    classified as irrelevant?

19
Filtering Evaluation
  • Filtering successful
  • 1-2 irrelevant documentsout of 20 are filtered
  • 0.5-1 relevant documentsare incorrectly filtered
  • Re-ranking performance better
  • decision relevant/irrelevant too coarse

20
Filtering Evaluation (Testing)
21
Re-Ranking details
  • Determine content of search results
  • title and summary of document d sufficient
  • category ci represents its content to the extent
    w(d,ci)
  • Combine personal interest in ci, p(ci), with
    w(d,ci) and original ranking r(d)

22
Re-Ranking Testing
Write a Comment
User Comments (0)
About PowerShow.com