CS403: Online Network Exploration - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

CS403: Online Network Exploration

Description:

CS403 The World Wide Web. 4. A conceptual network ... CS403 The World Wide Web. 6. Hypertext. On a computer, it's easy to make cross-references active. ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 55
Provided by: amg
Learn more at: http://pubpages.unh.edu
Category:

less

Transcript and Presenter's Notes

Title: CS403: Online Network Exploration


1
CS403 Online Network Exploration
  • The World Wide Web
  • Fall, 2007
  • Modified by Linda Kenney
  • 9/16/07

2
Defining the World Wide Web
  • Using the Web, its possible for anyone to
    publish their own Web pages on a host running a
    Web server and have those pages available to any
    Internet user with a Web browser.

3
Defining the World Wide Web (cont.)
  • Remember -- Along with e-mail, the Web is the
    Internet service that accounts for the vast
    majority of what people routinely do with the
    Internet.

4
A conceptual network
  • Although the Web is most definitely not a
    computer network, we could argue that it is a
    conceptual network of distributed resources.
  • Most commonly those resources are Web pages.
  • They may also include images, sounds, videos and
    more.
  • Most Web pages are connected to other resources
    via hyperlinks.
  • Visualizing the links between several separate
    resources provides some idea of where the term
    Web comes from.

5
Hypertext
  • The Web was invented in 1990.
  • But it was based on the concept of hypertext
    which had been around for decades.
  • The basic idea of hypertext is to take the
    passive cross-references that are common in
    printed text and make them active.
  • When reading a book, a cross-reference passively
    informs the reader where to turn for additional
    info and the reader must manually perform the
    actions necessary to obtain that additional info
    if it is desired.
  • Examples?

6
Hypertext
  • On a computer, its easy to make cross-references
    active. You notify the reader that additional
    info is available, but let the computer take the
    actions necessary to obtain that info if the
    reader desires it.
  • Such an active cross-reference is called a
    hyperlink (or just link) and text that contains
    such links is called hypertext.
  • This concept is fundamental to the Web as we know
    it.

7
Web presentations
  • Most Web pages do not exist in isolation.
  • The vast majority of them are grouped together
    into collections of pages with a common purpose
    or theme.
  • Such a collection of Web pages is called a Web
    presentation or Web site.
  • Typically, all the pages within a given
    presentation are under the editorial control of a
    single individual or organization.

8
Web presentations (cont.)
  • A given Web page is likely to contain several
    links to other pages.
  • Often, those links will lead to other resources
    within the same presentation. These links are
    called local links or links to local
    resources.
  • Some of those links may lead to other resources
    which are part of a different presentation. These
    links are called remote links or links to
    remote resources.

9
Clients and servers on the Web
  • Like most Internet services, the Web is based on
    the client/server model.
  • A Web browser is just a specific example of a
    client program.

10
Clients and servers on the Web (cont.)
  • The browser cant accomplish much without the
    cooperation of a server.
  • A Web server is a program that makes files
    available to Web browsers upon request.
  • In general, the files a Web server makes
    available contain Web pages and the images,
    sounds, videos and other media that supplement
    them.
  • And all the files a Web server has access to are
    generally stored in the secondary storage of the
    host on which the server runs.

11
Hypertext Transfer Protocol
  • Hypertext Transfer Protocol (HTTP) is the
    protocol that Web browsers and Web servers use to
    communicate with one another
  • As a protocol, it carefully defines the range of
    possibilities, determining precisely what a
    browser may say to a server and when.
  • Of course, it also dictates what servers can say
    to browsers and when.

I need the file page.html
Here is the file page.html
Browser
Server
12
HTTP requests and responses
  • When speaking HTTP, a Web browser generally
    sends an HTTP GET request to the Web server on a
    specific host requesting a specific resource.
  • When it receives an HTTP GET request from a
    browser, a Web server, in turn, sends some sort
    of HTTP response back to the browser.
  • Most commonly, the response will consist of the
    file and some information about the file.
  • But on occasion, the response will consist of an
    error message of some sort.
  • Note that HTTP requests and responses rely on TCP
    and IP to get across the Internet. (see p 72-74)
  • In other words, HTTP is layered on top of TCP and
    IP.

HTTP GET request for /page.html
HTTP response Status code 200 Content-type
text/html Content-length 4370 contents of
/page.html
HTTP response Status code 404 Not
Found Content-type text/html Content-length
1634 contents of error status page
Browser
Server
13
The servers responsibilities
  • When it receives an HTTP GET request, a Web
    server must prepare an appropriate HTTP response
    message.
  • The request will specify the file it is
    requesting.
  • The server must first locate the requested file
    within the file system of its host.
  • If the file cannot be located, the server sends
    back a 404 File not found response message.

14
The servers responsibilities (cont.)
  • Having found the file, however, the server must
    also verify that the file permissions allow it to
    access the file.
  • If the server is not able to access the file, it
    will typically return a 403 Forbidden response
    message.
  • If the requested file is located and accessible,
    the server generates a 200 OK response message
    that includes the contents of the file as well as
    a variety of headers that provide information
    about the file, such as its type, size and last
    modified date.

15
Locating files
  • A typical host stores thousands of files, all of
    which must be uniquely identified.
  • Its impractical to give 100,000 files unique
    names.
  • Instead, a host uses a file system consisting of
    a hierarchy of directories to create uniquely
    identified locations in which files may be
    stored.

16
Locating files (cont.)
  • Each location can be uniquely identified by the
    sequence of steps necessary to reach it from the
    top of the hierarchy.
  • The list of steps needed to reach a location from
    the top of the hierarchy is called the absolute
    path to that location, and every location has a
    unique absolute path.

17
Locating files (cont.)
  • All items in a given location must have unique
    names.
  • So each item in the hierarchy can be uniquely
    identified by combining its absolute path with
    its filename to form an absolute pathname.

18
Uniform Resource Locators
  • Before a browser can request a resource, it needs
    to know where it can find that resource and what
    type of server will be providing it.
  • To find a specific resource, the browser must be
    told not only the name of the file containing
    that resource, but also what host it is on and
    where it is in the file system of that host.
  • Fortunately, all the information needed to find a
    specific resource, out of the billions available
    on the Web, is contained in that resources
    Uniform Resource Locator (URL).
  • Each resource available on the Web is identified
    by a unique URL that contains all the information
    necessary for a browser to retrieve that resource.

19
Uniform Resource Locators (cont.)
  • Regardless of how the URL is provided, the
    browser always does the same thing with it it
    requests the resource and renders it on the
    screen.
  • In computer science, we use the term render to
    refer to the process of producing an image by
    interpreting some data.
  • A browser renders a Web resource by determining
    what to display on the screen based upon what it
    finds in the HTTP response that contains the
    contents of that resource.

20
The anatomy of a URL
  • Consider a typical URL
  • A URL typically begins with the protocol to use
    when accessing the resource.
  • The remainder of the URL is the identifier that
    tells the browser how to locate the resource.
  • The identifier starts with a hostname that
    uniquely identifies the host on which the
    resource is stored.
  • The rest of the identifier is the pathname that
    uniquely locates the resource in that hosts file
    system.
  • The pathname, as weve discussed consists of a
    path and a file name.

http//www.sample.com/products/catalog/prod1.html
http//www.sample.com/products/catalog/prod1.html
http//www.sample.com/products/catalog/prod1.html
21
The Web step-by-step step 1
  • The process of displaying a Web resource begins
    when the browser is given the URL of that
    resource by the user.
  • The browser examines that URL to find out what it
    needs to do next.
  • The first part (ex http//) tells the browser
    what protocol to use, and indirectly what type of
    server to contact.
  • The identifier tells the browser where the
    resource is located.
  • The hostname in the identifier tells the browser
    which host is running the server responsible for
    the resource.
  • The pathname in the identifier tells the browser
    precisely where the desired resource is stored in
    that hosts file system.
  • Using this information, the browser composes an
    HTTP GET request message.
  • The GET request contains the pathname of the
    desired resource as well as the hostname of the
    servers host and various other information.

22
The Web step-by-step step 2
  • The HTTP GET request must be sent to the
    appropriate server.
  • Since it must arrive in its entirety at a
    specific host, the request gets sent over the
    Internet using TCP and IP.
  • To establish a TCP connection with the server,
    the browser needs to know the IP address of the
    host running the server.
  • To get the IP address of the servers host, the
    browser resolves the hostname in the URLs
    identifier using DNS.
  • Using the IP address of the servers host, the
    browser establishes connection with the server.
  • The HTTP GET request message is sent to the
    server over this connection. Since the request
    message is small, it takes little time to send.

23
The Web step-by-step step 3
  • When a Web server receives an HTTP GET request,
    it composes an HTTP response.
  • Using the pathname specified in the request, the
    server attempts to locate the file containing the
    resource within the file system of its host.
  • Once the resources file has been located, the
    server verifies that it has permission to access
    that file.

24
The Web step-by-step step 3 (cont.)
  • If the server is able to locate and access the
    file, the HTTP response will indicate success.
  • The response will also indicate the date and time
    at which the file was last modified, the type of
    resource the file contains and how big it is.
  • And the server will include the contents of the
    resources file in the response message.
  • Note that this means the size of the response
    message is primarily determined by the size of
    the resource being requested.
  • If the server is unable to locate or access the
    file, the HTTP response will indicate the nature
    of the problem.
  • The response may also contain some content for
    the browser to use in lieu of the requested
    resource.

25
The Web step-by-step step 4
  • Having composed an HTTP response, the server must
    now send it back to the requesting browser.
  • The server uses TCP over IP for this purpose.
  • It gets the IP address for the browser from the
    packet that carried the HTTP request.
  • Because they typically contain the contents of
    the requested resource, HTTP response messages
    tend to be significantly larger than HTTP request
    messages.
  • Responses generally take much longer to send over
    the Internet than requests.
  • This is generally the source of the derogatory
    term The World Wide Wait.
  • To minimize the time a user must wait to receive
    a requested resource, its up to the creator of
    that resource to minimize the size of the file
    containing the resource.

26
The Web step-by-step step 5
  • Upon receiving an HTTP response message, the
    browser is responsible for rendering the resource
    it contains.
  • Many resources will be Web pages, which are
    written in Extensible Hypertext Markup Language
    (XHTML).
  • Rendering a Web page involves interpreting the
    XHTML to determine what the page should look
    like.
  • Other resources, however, will be other forms of
    media such as images, sounds and video.
  • Rendering multimedia resources involves
    interpreting the data those resources contain and
    producing the image, sound or video that data
    represents.
  • Browsers therefore need to understand a range of
    resource types.

27
The Web step-by-step step 5 (cont.)
  • Its also useful to note at this stage that even
    though a Web page may appear to contain images,
    sounds and videos, each of those resources must
    be stored separately in its own file.
  • And each of those resources must therefore be
    retrieved from a server with a separate HTTP
    transaction.
  • As a result, the time it takes to retrieve a Web
    page is the sum of the time it takes to retrieve
    all of its component parts.

28
The browser lends a hand
  • Browsers can also play a role in minimizing the
    time the user must wait for a page to load.
  • A user often revisits the same resources
    repeatedly.
  • Imagine waiting five minutes to retrieve a
    resource.
  • Then, after the resource loads, you activate a
    link within it and go to another resource.
  • Examining the second resource, you realize its
    not what you expected and decide to use your
    browsers back button to return to the previous
    resource.
  • Now the browser needs to retrieve the same
    resource it just rendered all over again.
  • Obviously, you dont want to wait five minutes
    again.
  • What you want is for the browser to have saved
    that resource so that you can return to it
    without having to request it from the server
    again.
  • Thats exactly what browsers do.

29
The browser cache
  • As a browser receives each requested resource, it
    stores a copy of that resource in a special place
    called the browser cache.
  • Along with the contents of the resource it stores
    the current date and time and the URL used to
    retrieve the resource.
  • Each time a resource is requested, the browser
    checks to see if that resource is already stored
    in its cache.
  • If its not, then the browser goes about
    retrieving the resource as weve already
    described.

30
The browser cache (cont.)
  • If the resource is in the cache, however, the
    browser may be able to use it.
  • To find out if its useable, the browser sends an
    HTTP HEAD request for that resource to the
    server.
  • This causes the server to send back only the
    information about the resource, which will
    include the date and time it was last modified.
  • If the resource on the server has not been
    modified since the copy of that resource was
    stored in the browsers cache, the browser can
    use the cached copy.
  • Otherwise, the browser must retrieve (and cache)
    a fresh copy from the server.
  • This requires more HTTP messages, but theyre
    smaller on average.

31
When things go wrong
  • Although it often goes off without a hitch, there
    are places in an HTTP transaction where problems
    can occur.
  • Knowing what might go wrong can help us make
    sense of otherwise cryptic or confusing error
    messages we may get from our browser.
  • Of course, different browsers and servers are
    free to use different error messages as they see
    fit, so the wording may differ.

32
When things go wrong (cont.)
  • If the hostname in the URL cannot be resolved to
    an IP address using DNS, theres no way to
    establish the necessary TCP connection to the
    server.
  • In this case, well get an error to the effect of
  • Unable to locate server.

33
When things go wrong (cont.)
  • The hostname may resolve but the TCP connection
    may not be able to be established for a variety
    of other reasons.
  • In this case, well get an error to the effect of
  • No response.

34
When things go wrong (cont.)
  • If were able to get a TCP connection and send an
    HTTP request to the server, theres no guarantee
    it will be successful.
  • If the server is unable to locate the requested
    file, well get an error to the effect of
  • Not found.
  • If the server locates the file but does not have
    permission to access it, well get an error to
    the effect of
  • Forbidden or Access denied.

35
And how to fix it
  • Understanding the root cause of an error can
    often help you devise a solution to the problem.

36
And how to fix it (cont.)
  • If you get an Unable to locate server error,
    you know theres a problem with the hostname in
    the URL.
  • Double-check your typing of the hostname.
  • Make sure your network connection is still
    working.
  • Ensure that your DNS server is functioning in
    general.

37
And how to fix it (cont.)
  • If you get a No response error, you know the
    hostname is okay but the server is not able to
    respond.
  • Often, theres nothing you can do about this
    yourself.
  • However, since this is often a temporary problem,
    try again a little later.

38
And how to fix it (cont.)
  • If you get a Not found error, you know theres
    a problem with the pathname in the URL.
  • Again, double-check your typing, paying attention
    to case.
  • Try eliminating steps from the pathname one at a
    time, moving from right to left.
  • How?

39
And how to fix it (cont.)
  • If you get a Forbidden error, the problem is
    with the permissions on the file containing the
    requested resource.
  • If the file belongs to you, simply adjust the
    permissions.
  • Otherwise, theres little you can do about this
    problem yourself except contact the owner of the
    resource.

40
Resource types
  • As weve seen, the Web consists of a variety of
    resource types.
  • In each HTTP response, the server includes an
    indicator of the resources type so the browser
    knows how to render it.
  • Since servers and browsers must agree on the
    meaning of this type info, it needs to be
    standardized.

41
Resource types (cont.)
  • The standard used for this purpose is called
    Multipurpose Internet Mail Extensions (MIME).
  • As you can tell from its name, MIME was
    originally designed for use with e-mail.
  • A MIME type consists of an indicator of the
    general resource type (text, image, audio, etc.)
    followed by a / followed by an indicator of the
    specific resource type (html, jpeg, mpeg, etc.).
  • For example, XHTML files are assigned a MIME type
    of text/html.
  • JPEG image files are assigned a MIME type of
    image/jpeg.
  • MP3 sound files are assigned a MIME type of
    audio/mpeg.

42
Filename extensions
  • The server needs to know the type of each
    resource for which it is responsible.
  • Otherwise, it wouldnt know what MIME type to
    list in the HTTP response message.
  • To avoid having to explicitly tell the server the
    type of each resource, servers are set up to use
    the extension of the resources filename to
    determine its type.
  • A filename extension is part of the actual
    filename, but it comes at the end and starts with
    a dot.
  • Examples?
  • The server is configured to associate certain
    filename extensions with specific MIME types.

43
Filename extensions (cont.)
  • For this reason, its important to name all of
    the files containing your Web resources with
    appropriate filename extensions.
  • Well generally use only a small number of
    resource types in this course.
  • XHTML files are given .html (or .htm) extensions.
  • JPEG images are given .jpg (or . jpeg )
    extensions.
  • GIF images are given .gif extensions.
  • CSS files are given .css extensions.

44
What Browsers Understand
  • A browser understands the HTTP protocol for
    retrieving Web pages.
  • Most browsers also understand protocols for other
    Web services like file transfer, instant
    messaging, e-mail and network news.
  • A browser understands XHTML and HTML and can
    interpret it in order to render Web pages.
  • Many also understand other popular languages like
    CSS, JavaScript and XML .

45
What Browsers Understand (cont.)
  • Most browsers understand common image file
    formats like JPEG and GIF and can render images
    stored in these formats.
  • Some also understand image file formats like BMP
    and PNG.
  • Many browsers understand other forms of media as
    well.
  • Flash presentations are used for interactive
    animations.
  • MP3 is a file format commonly used for storing
    sounds and music.
  • MPEG and AVI are common file formats for storing
    video.

46
What Browsers Understand (cont.)
  • A good browser is designed to provide the
    functionality most Web users are likely to need.
  • Browser designers, however, realize that people
    use the Web in many different ways.
  • For this reason, most browsers are designed to
    accept two different types of add-ons that extend
    their capabilities.

47
Add-Ons Helpers and Plug-Ins (p. 76-83)
  • An application is a program you run on your
    computer to accomplish specific tasks.
  • You can obtain applications from retail software
    stores or the Internet.
  • A browser often uses other applications to view
    the Web.
  • You can customize what applications your browser
    uses.

48
Helpers
  • A helper application is an application a browser
    can launch. It can be any application on your
    computer.
  • Examples?
  • When your browser encounters a file that requires
    special handling, it looks for an appropriate
    helper application and opens the file in that
    application.
  • When browsers first were introduced, helper
    applications were the only option.

49
Plug-Ins
  • A browser plug-in is an application that expands
    the capabilities of a web browser.
  • When you install a plug-in, you extend the
    capabilities of your browser to handle a file
    type that it wasnt originally designed to
    handle.
  • Any file requiring that plug-in will be displayed
    inside the browser window, with the plug-in
    working as if it were a part of your browser.

50
Plug-Ins (cont.)
  • Plug-ins support everything from audio to
    animation to documents
  • Plug-ins increase your browsers memory
    requirements and launch time.
  • You can find Web pages to help you locate
    plug-ins for your browser.

51
Common plug-ins and helper applications
52
Review questions
  1. Define the World Wide Web and explain its
    relationship with the Internet.
  2. Explain what is meant by referring to the Web as
    a conceptual network of distributed resources.
  3. Explain the concept of hypermedia.
  4. What type of information does a Web server
    typically include in the header of an HTTP
    response, and how might it be useful to a Web
    browser?
  5. Explain the usefulness of a file system in the
    context of the Web.
  6. Describe three ways in which a user might specify
    a desired URL to their browser.
  7. Explain how the Web works behind the scenes. What
    roles do Hypertext Transfer Protocol (HTTP),
    Uniform Resource Locators (URLs), and the
    browsers cache play in this process?
  8. What are some common errors that can occur when
    requesting a Web page and what do they mean?
  9. Explain the relationship between resource types
    and filename extensions on the Web. Why is it
    important?
  10. Compare and contrast a plug-in with a helper app.

53
Key terms
  • Absolute path
  • Absolute pathname
  • Browser cache
  • Browsing
  • Conceptual network
  • File system
  • Filename extension
  • Helper app
  • Hostname
  • HTTP
  • HTTP GET request
  • HTTP HEAD request
  • HTTP response
  • Hyperlink
  • Hypermedia
  • Hypertext
  • Identifier

Link Local link MIME MIME type Pathname Permission
s Plug-in Remote link Render Scheme URL Web
browser Web presentation Web server Web
site World Wide Web XHTML
54
  • Some information used from
  • Web 101 by Lehnert and Kopec
Write a Comment
User Comments (0)
About PowerShow.com