The Anatomy of a largescale hypertextual Web search engine by Sergey Brin, Lawrence Page appearing i

About This Presentation

Title:

The Anatomy of a largescale hypertextual Web search engine by Sergey Brin, Lawrence Page appearing i

Description:

finds related pages (based on anchor text ... As of late 1997, only one of four of the major search engines ... Conference on Hypertext, New York, 1996. ... – PowerPoint PPT presentation

Number of Views:136

Avg rating:3.0/5.0

Slides: 21

Provided by: csUa

Category:

more less

Transcript and Presenter's Notes

Title: The Anatomy of a largescale hypertextual Web search engine by Sergey Brin, Lawrence Page appearing i

1
The Anatomy of a large-scale hypertextual Web
search engineby Sergey Brin, Lawrence
Pageappearing in Computer Networks and ISDN
Systems 1998

Presented by
Damon Sutherland
University of Alabama
in partial fulfillment of the requirements for
Internet Algorithms course, Fall 2005

2
Introduction to web searches

First automated web bots searched linearly and
indexed URLs and titles only
Hard to search for specific items
By late 1995 AltaVista launched the first search
engine with natural language queries
By late 1996 Lycos had indexed 60 million pages
Yahoo was initially released as a list of the
creators favorite sites in 1994

3
Overview of query processing

User types computer
Google
finds related pages (based on anchor text words
in the page)
retrieves snippets from the top related pages
returns the result to the user in order

4
Motivation

Increase the relevance of queries
People generally view the first tens of results

5
Motivation

Increase the relevance of queries
People generally view the first tens of results

6
Motivation
Increase the relevance of queries

As of late 1997, only one of four of the major
search engines returned a link to itself in the
top 10 results.

7
Motivation

Scalable
by number of web pages indexed

8
Motivation

Scalable
web queries per day

9
How to find related pages

By the text on a page
Google parses the source code and breaks the text
into a series of word occurrences

10
How to find related pages

Anchor Text is the description of the link by the
page author.
ltA HREFpage2.htmgtI love dogs!lt/Agt
Google believes the Anchor Text is as important
as the page text.

11
Anchor text

Anchor Text increases relevance
Unlike other search engines, Google associates
the Anchor Text with the link it points to.
This allows Google to return pages that cannot be
crawled, ie., pictures, programs, etc.

12
Anchor text, contd.

Google ranking can be manipulated
A large number of pages, using Anchor Text, can
influence the PageRank of a page.
Called a Google bomb.

source http//www.litigiousbastards.com/
13
How to compute importance of pages

Google creates a web citation map
details the relationship of a significant
sample of hyperlinks on the web
a link to a node is a vote for that node

14
Web citation graph

Compute PageRank of each graph node
your rank is high when several high-rank nodes
link to you
many nodes link to you
Details subsequent talk

15
Model / System description
Bring, Page. (1998) Fig. 1
16
Model / System comparison
Heydon, Najork. (1999) Fig 1.
17

In 1998
Indexed 26 million pages in 9 days
The last 11 million in less than 3 days
The HTTPWorker equivalent averages 48.5 pages per
second.
In 2005
Indexed 8.1 billion web pages, 1 billion images,
and 1 billion Usenet posts.

18
Future work

Boolean Operators AND, -, , OR
User context (location, etc.)
Scale to 100 000 000 pages
Use text around links as well as Anchor Text
Proxy caches to build search databases

19
Personal observations

Google has become widespread
Its become its own verb Just google it.
Its launched map direction services, research
journal searches, etc.
This paper is old.
Google indexed 2 billion webpages 4 years ago and
is up to 8 billion now.

20
Related work

J. Cho, H. Garcia-Molina and L. Page, Efficient
crawling through URL ordering, in Proc. Of the
7th International World Wide Web Conference (WWW
98), Brisbane, Australia, April 14-18, 1998.
R. Weiss, B. Velez, M.A. Sheldon, C. Manprempre,
P. Szilagyi, A. Duda, and D. K. Grifford,
HyPursuit a hierarchical network search enging
that exploits content-link hypertext clustering,
in Proc. of the 7th ACM Conference on Hypertext,
New York, 1996.
Cooper, Colin and Alan Frieze, Crawling on Simple
Models of Web Graphs, in Internet Mathematics 1
57-90, 2003