The Distributed WebTiger - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

The Distributed WebTiger

Description:

a biologically-inspired web crawler (a WebTiger) implemented in C . ... Specify the sampling size of the biologically-inspired web crawler, Start the WebTiger, ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 14
Provided by: gerryd
Category:

less

Transcript and Presenter's Notes

Title: The Distributed WebTiger


1
The Distributed WebTiger
  • COMP-2710/4970
  • Semester Project
  • Spring 2006

2
The Distributed WebTiger
  • For our semeser project we will be working in
    groups of four to develop a distributed search
    engine called the distributed WebTiger (DWT).
  • The final date for demos is May 1st!
  • Your demo should include the UML Artifacts
  • Use Cases,
  • Domain Model.
  • The DWT will be composed of
  • a Java-based graphical user interface (GUI)
    front-end, and
  • a biologically-inspired web crawler (a WebTiger)
    implemented in C.

3
The Distributed WebTiger
  • When give a query (word or phrase) and a starting
    URL address the DWT will attempt to discover the
    largest set of linked webpages that all contain
    the user supplied query.
  • Individual WebTigers will work together in a
    parallel, distributed fashion.

4
The Distributed WebTiger
  • The DWT will use the following files that will be
    located in a users public_html directory
  • SearchRequest.html will contain a query and a
    specified URL address to start from. If no query
    is specified this webpage will contain the string
    NoSearchRequest.
  • ListSoFar.html will contain a list of URL
    addresses along with the number of times the
    query appears in the webpage. This list does not
    have any duplicates (cycles).
  • MyFriends.html will contain the URL addresses to
    your friends SearchRequest.html webpages.

5
The Distributed WebTigerThe GUI
  • A user will interact with the GUI to
  • Add/remove friends,
  • Specify a query and a starting URL address,
  • Display the current query and starting URL
    address,
  • Display the list of webpages discovered so far
    (this will require that the WebTiger be called to
    return the largest list discovered by all
    WebTigers (friends) participating in the search),
  • Help with a friends query,

6
The Distributed WebTigerThe GUI (cont.)
  • Specify the total number of webpages to search,
  • Specify the sampling size of the
    biologically-inspired web crawler,
  • Start the WebTiger,
  • Any other features that you would like to add
    5pts extra credit for each new feature. However,
    the feature must be at the C Level of your
    implementation.

7
The Distributed WebTigerThe simple WebTiger
  • The simple WebTiger will work as follows
  • Given that the owner has specified a query and
    starting address in SearchRequest.html file, the
    WebTiger will begin its search based on the
    query.
  • Given that the owner has specified that the
    should help a friend, the WebTiger will then
  • Obtain a query from a friends SearchRequest.html
    file, and
  • Obtain the list of webpages found in the
    ListSoFar.html file of the friend,
  • Commence web crawling from the last URL in the
    list.

8
The Distributed WebTigerThe Simple WebTiger
  • The simple WebTiger will need the following
    information
  • A Query
  • A URL address to begin its search from,
  • A total number of webpages to access (visit),
  • A sampling size, ?.
  • The sampling size be the number of links to visit
    from a given webpage.

9
The Simple WebTigerBiologically-Inspired Search
  • Given the query, starting URL address, number of
    pages to process and the population size the
    simple WebTiger will search as follows
  • Get the appropriate best ListSoFar.html file of
    you and your neighbors and copy it to your
    ListSoFar.html.
  • Use the getWebPage Java program discussed earlier
    in the course to get the N links of the last URL
    in your ListSoFar.html file.
  • Randomly visit ? of the N links discovered and
    record the number of times the query has occurred
    in each of those P.
  • Add the URL address to your ListSoFar.html file
    that has the largest number of occurences of the
    query in it.
  • If Random()
  • This Search method is similar to a (1?)
    Evolutionary Strategy or Hill-Climber.

10
The Simple WebTigerBiologically-Inspired Search
  • What should you do if the all of the sample
    webpages do not contain the query?
  • How to handle this is up to your group.

11
The Distributed WebTiger
  • The GUI should interact with the WebTiger using
    command-line arguments.
  • After issuing a command to the WebTiger the GUI
    should then display any necessary or requested
    information to the user by reading and/or
    displaying
  • The SearchRequest.html file,
  • The ListSoFar.html file,
  • The MyFriends.html file, and/or
  • Any other intermediate files that you feel you
    will need.

12
The Distributed WebTiger
  • You will be required to use as many objects as
    are appropriate.
  • Some classes that are required
  • A WebTiger Class,
  • A Webpage Class,
  • A Link (or URL) Class
  • We will discuss these classes during the upcoming
    lectures.

13
Questions
  • ?
Write a Comment
User Comments (0)
About PowerShow.com