Authoritative Sources in a Hyperlinked Environment - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Authoritative Sources in a Hyperlinked Environment

Description:

P. Q. Conferred Authority. Hub-Authority method ... y(p) = q : (q,p) Ex(p) Hub and Authority ... {y(p) } is a vector y with hub weight for each page in G ... – PowerPoint PPT presentation

Number of Views:235
Avg rating:3.0/5.0
Slides: 21
Provided by: park99
Category:

less

Transcript and Presenter's Notes

Title: Authoritative Sources in a Hyperlinked Environment


1
Authoritative Sources in a Hyperlinked Environment
  • Hui Han
  • CSE dept, PSU
  • 10/15/01

2
Main Idea
  • Return more authoritative information, using the
    algorithm based on link structure analysis, esp.
    the relationship between a set of relevant
    authoritative pages and the set of hub pages
    that join them together in the link structure

3
The types of queries
  • Specific queries scarcity problem
  • Does Netscape support the JDK 1.1 code-signing
    API
  • Broad-topic queries abundance problem
  • Find information about the Java programming
    language.
  • Solution find a small set of the most
    authoritative relevant pages.
  • Similar-page queries
  • Find pages similar to java.sun.com

4
Linked Based Analysis
  • Limitations of text based analysis
  • Text-based ranking function
  • Eg. Could www.harvard.edu be recognized as one of
    the most authoritative pages, since many other
    webpages contain harvard more often.
  • Pages are not sufficiently self descriptive
  • Usually the term search engine doesnt appear
    on search engine web pages

5
Limitations of Basic Link-based Analysis
Conferred Authority
P
Q
  • Basic model Of all pages containing the query
    string, return those with the greatest number of
    in-links
  • Large number of links are created primarily for
    navigational purposes
  • Difficulty in finding a balance between relevance
    and popularity
  • Popular sites like www.yahoo.com would be
    considered as highly authoritative on any query
    strings it contained.

6
Hub-Authority method
  • A link-based model for the conferral of
    authority, using the method that consistently
    identifies relevant, authoritative WWW pages for
    broad search topics.
  • Difference from Clustering
  • Distinguishing pages related to different
    meanings or senses of a query term

7
Step1 Constructing a Focused Subgraph of the WWW
  • Properties of a ideal collection S? of pages
  • S? is relatively small
  • S? is rich in relevant pages
  • S? contains most(or many) of the strongest
    authorities
  • Construction of S?
  • Collect the t highest-ranked pages for the query
    ? from a text-based search engine (Altavista,
    hotbot) as root set R?
  • Expand R? to base set S?
  • a strong authority is quite likely to be pointed
    to by at least one page in R?
  • Keep only transverse links

8
Step2 Compute Hubs and Authorities
  • Authoritative pages should
  • have large in-degree
  • have a considerable overlap in the sets of pages
    that point to them, since they share a common
    topic
  • Hub pages pull together(link to) authorities
    on a common topic

9
An Iterative Algorithm
  • G? the subgraph induced on the pages in S?
  • x(p) authority weight of Page P
  • y(p) hub weight of Page P
  • Normalize ?p?S? (x(p))2 1
  • ?p?S? (y(p))2 1
  • I operation
  • x(p) ?q (q,p) ? Ey(p)
  • O operation
  • y(p) ?q (q,p) ? Ex(p)
  • Hub and Authority exhibit
  • mutually reinforcing relationship

10
An Iterative Algorithm(cont.)
  • x(p) is a vector x with authority weight for
    each page in G?
  • y(p) is a vector y with hub weight for each
    page in G?
  • Filter out the top c authorities and top c hubs

11
Basic knowledge of Matrix
  • M symmetric nn matrix ? vector ? a number
  • If for some vector ?, M ? ? ?, we say,
  • The set of all such ? is a subspace of Rn
    Eigenspace associated with ?
  • These ?1(M), ?2(M), are eigenvalues, while
    ?1(M), ?2(M), are eigenvectors
  • ?i(M) belongs to the subspace of ?i(M)
  • If we assume ?1(M) ?2(M), we refer to ?1(M)
    as the principal eigenvector, and all other ?i(M)
    as non-principal eigenvector.

12
Convergence Proof of Iterate Procedure
  • Theorem1. The sequences x1, x2, x3, and y1, y2,
    y3, converge to x and y respectively.
  • Proof G(V,E) Vp1, p2, , pn
  • A is the adjacency matrix of graph G Aij 1 if
    (pi, pj) is an edge of G.
  • I O operations can be written as
  • x ? ATy y ? Ax K loops,
  • So, x (1)? AT Ax (0) x(0) AT z
  • x ? x (k)? (AT A)k-1 AT z
  • y ? y (k)? (AAT)k z
  • if ? is a vector not orthogonal to the principle
    eigenvector ?1(M), the unit vector in the
    direction of Mk? converges to ?1(M) as k
    increases without bound

13
Convergence Proof of Iterate Procedure(cont.)
  • A is called an orthogonal matrix if AAT AT A
    E.
  • Theorem2 x is the principal eigenvector of
    ATA, and y is the principal eigenvector of AAT.
  • Experiment finds that k20 is sufficient for the
    convergence of vectors.

14
Experimental results
  • Most of these authorities do not contain any
    occurrences of the initial query string , except
    http//www.roadhead.com

15
Similar-Page Queries
  • The strongest authorities in the local region of
    the link structure near p, can server as a
    broad-topic summary of the pages related to p.
  • A small difference in initial request to the
    search engine Find t pages pointing to p, to
    get R?

16
Multiple sets of Hubs and Authorities
  • Why?
  • The query string ? may have several very
    different meanings. Eg.java
  • The string may arise as a term in the context of
    multiple technical communities. Eg. randomized
    algorithms
  • The string may refer to a highly polarized issue,
    involving groups that are not likely to link to
    one another. Eg. abortion
  • Idea
  • The NON-principle eigenvectors of ATA and AAT
    provide us with a natural way to extract
    additional densely linked collections of hubs and
    authorities from the base set S?.

17
Multiple sets of Hubs and AuthoritiesExperimental
result1
  • For the query jaguar, the strongest
    collections of authoritative sources concerned
    the Atari Jaguar product, the NFL football team
    from Jacksonville, and the automobile.

18
Multiple sets of Hubs and AuthoritiesExperimental
result2
  • For the query randomized algorithms, none of
    the strongest collections of hubs and authorities
    are precisely on the query topic. They include
    home pages of theoretical computer scientists,
    compendia of mathesmatical software and pages on
    wavelets.

19
Multiple sets of Hubs and AuthoritiesExperimental
result3
  • For the query abortion, there is a natural
    separation.

20
Conclusion
  • A technique for locating high-quality information
    related to a broad search topic on the www, based
    on a structural analysis of the link topology
    surrounding authoritative pages on the topic.
  • Related work.
  • Standing, influence in social networks,
    scientific citations, etc.
  • Hypertext and WWW rankings
Write a Comment
User Comments (0)
About PowerShow.com