Web Search Engines - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Web Search Engines

Description:

If it's not in the database, the best search engine will not be able to find the ... Using five search engines. How many pages are found by all five or at ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 33
Provided by: greg178
Category:
Tags: best | engine | engines | search | web

less

Transcript and Presenter's Notes

Title: Web Search Engines


1
Web Search Engines
  • by Greg R. Notess
  • notess_at_imt.net
  • imt.net/notess/search

2
Overview
  • Comparing the database content
  • Change
  • Comparative Size
  • Overlap
  • Looking towards future developments
  • Portal or Destination
  • Output sorting

3
Results are limited by
  • Database content
  • The Web sites included
  • The depth to which they are indexed

4
  • If its not in the database, the best search
    engine will not be able to find the Web page

5
So whatre they like?
  • Very large databases
  • Most index all words on page
  • None index words in images
  • Lets see how the databases compare to the real
    Web

6
Change over time?
7
Overall Size Change
  • Is the Web in general
  • Growing?
  • Shrinking?
  • Remaining the same?

8
Excite 6 Searches 10/96-8/98
9
What about the rest?
  • Whos the biggest?
  • How to measure?
  • Actual search results
  • Verified hits

10
(No Transcript)
11
And over time?
  • 8/98 -- AltaVista, Northern Light, HotBot
  • 5/98 -- AltaVista, HotBot, Northern Light
  • 2/98 -- HotBot, AltaVista, Northern Light
  • 10/97 -- AltaVista, HotBot, Northern Light
  • 9/97 -- Northern Light, Excite, HotBot
  • 6/97 -- HotBot, AltaVista, Infoseek
  • 10/96 -- HotBot, Excite, AltaVista

12
Back to change in size
  • Lets look at six search engines
  • Over the course of two years

13
(No Transcript)
14
But at least
  • They have a high degree of duplication between
    them
  • Right?

15
Try 4 small searches
  • Using five search engines
  • How many pages are found by all five or at least
    by four of them?

16
ZERO
17
Overlap
18
And they exclude most
  • Content of Adobe PDF and formatted files
  • The content in most sites requiring a log in
  • CGI output data requested by a form
  • Other dynamically produced data
  • Pages protected by a robots.txt file
  • Intranets, pages not linked from anywhere else
  • Commercial resources with domain limitations
  • Non-Web resources

19
Scope Summary
  • Inconsistent growth
  • Not full coverage
  • Surprisingly low duplication

20
Positive Side?
  • Essential for searching the Net
  • Can be used effectively
  • Phrase search
  • Use more than one
  • Smart searching

21
  • Incredibly popular
  • Even when they fail
  • But then, since when is finding information
    always easy?

22
Overview
  • Comparing the database content
  • Change
  • Comparative Size
  • Overlap
  • Looking towards future developments
  • Portal or Destination
  • Output sorting

23
What is a search engine?
  • Portal?
  • Gateway?
  • Destination?

24
Search Engine
  • the software than searches a database

25
Development
  • Database of Web pages
  • adds Supplementary Database
  • Phone numbers, reference, businesses, news
  • then adds Subject directory
  • then Services
  • email, ISP, shopping, travel agent
  • now Communities

26
Portal to Destination?
  • Driving force
  • advertising revenue
  • Keep users longer for more
  • Conflicts with portal and gateway principle

27
Future possibilities?
  • Smaller databases
  • Less pointing to external pages
  • Paid advertising or sponsorship for visibility
  • Rise of search only sites?

28
Output Development
  • Initially, Relevance ranking
  • Crude
  • Not site or URL based
  • Some site sorting from Excite
  • No date sorting

29
Site Sorting
  • Infoseek, then Lycos, now HotBot
  • Group together by site
  • More relevant than prior algorithms
  • Northern Light includes it in
  • Custom Folders

30
Other Output
  • RealName on AltaVista
  • Direct Hit on HotBot
  • Subject Directory Categories
  • News
  • Books, CDs, etc. about search term

31
Search Engine Showdown
  • imt.net/notess/search
  • Search engine features
  • See also
  • www.searchenginewatch.com
  • See also
  • Rich Wiggins, Coming up next . . .

32
Web Search Engines
  • by Greg R. Notess
  • notess_at_imt.net
  • imt.net/notess/search
Write a Comment
User Comments (0)
About PowerShow.com