Hilltop and Topic Distillation - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Hilltop and Topic Distillation

Description:

It is an alternative to page rank. Computes a query specific subgraph of the WWW. Computes a score for every page in the subgraph based on hyperlink connectivity. 13 ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 15
Provided by: csU62
Category:

less

Transcript and Presenter's Notes

Title: Hilltop and Topic Distillation


1
Hilltop and Topic Distillation
  • Liren Ding

2
Ranked search result
3
Authoritativeness of ranked results
  • Prior approaches
  • Rank based on content
  • Problem spam
  • Ranking based on human classification (Yahoo)
  • Problem slow, inefficient, inadequate and
    incomplete
  • Ranking based on usage information (DirectHit)
  • Problem spam, needs a large amount of data
  • Ranking based on connectivity
  • Problem based on assumption
  • Pages on the topic link to each other
  • Authoritative pages tend to point to other
    authoritative pages.

4
Hilltop
  • Author Krishna Bharat
  • A Principal Scientist at Google who created
    Google News. This service can automatically index
    about 4500 news websites around the world and
    provide a summary of the News resources.

5
Hilltop
  • In response to a query
  • compute a list of the most relevant experts on
    the query topic.
  • identify relevant links within the selected set
    of experts, and follow them to identify target
    web pages
  • Rank the pages according to the number and
    relevance of non-affiliated experts that point to
    them

6
Hilltop
  • Rank based on Expert Documents
  • based on the assumption that the number and
    quality of the sources referring to a page are a
    good measure of the page's quality
  • The key difference is that Hilltop only
    considering "expert" sources

7
Hilltop
  • Expert pages
  • what makes a page an expert?
  • expert page needs to be unbiased and point to
    numerous non-affiliated pages on the subject
  • Hilltop need to do
  • Host affiliation detection
  • Experts selection
  • Indexing the Experts

8
hilltop
  • Host Affiliation Detection
  • two hosts as affiliated if
  • They share the same first 3 octets of the IP
    address.
  • The rightmost non-generic token in the hostname
    is the same.
  • E.g. "www.ibm.com" and "ibm.co.mx"

9
hilltop
  • Experts selection
  • If pages with out-degree greater than a
    threshold, k (e.g., k5) , then test to see if
    these URLs point to k distinct non-affiliated
    hosts. Every such page is considered an expert
    page.

10
hilltop
  • Indexing the Experts
  • Only index text contained within "key phrases" of
    the expert
  • E.g. the title, headings and anchor text within
    the expert page are considered key phrases

11
Query Processing in Hilltop
  • Determine a list of N experts that are the most
    relevant for that query
  • Rank results by selectively following the
    relevant links from these experts and assigning
    an authority score to each such page

12
Topic Distillation
  • The algorithm is used to find topic relevant
    documents to the particular keyword topic
  • It is an alternative to page rank
  • Computes a query specific subgraph of the WWW
  • Computes a score for every page in the subgraph
    based on hyperlink connectivity

13
Problem of Topic Distillation
  • Only applicable to broad queries
  • Compute the subgraph of the WWW in real-time is
    hard

14
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com