URLs, InetAddresses, and URLConnections - PowerPoint PPT Presentation

About This Presentation
Title:

URLs, InetAddresses, and URLConnections

Description:

The keys of the header fields are returned by the getHeaderFieldKey(int n) method. ... First inform the URLConnection that you plan to use it for output ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 87
Provided by: cafea
Category:

less

Transcript and Presenter's Notes

Title: URLs, InetAddresses, and URLConnections


1
URLs, InetAddresses, and URLConnections
  • High Level Network Programming
  • Elliotte Rusty Harold
  • elharo_at_sunsite.unc.edu
  • http//sunsite.unc.edu/javafaq/URLS.PPT

2
We will learn how Java handles
  • Internet Addresses
  • URLs
  • CGI
  • URLConnection
  • Content and Protocol handlers

3
I assume you
  • Understand basic Java syntax and I/O
  • Have a users view of the Internet
  • No prior network programming experience

4
Applet Network Security Restrictions
  • Applets may
  • send data to the code base
  • receive data from the code base
  • Applets may not
  • send data to hosts other than the code base
  • receive data from hosts other than the code base

5
Some Background
  • Hosts
  • Internet Addresses
  • Ports
  • Protocols

6
Hosts
  • Devices connected to the Internet are called
    hosts
  • Most hosts are computers, but hosts also include
    routers, printers, fax machines, soda machines,
    bat houses, etc.

7
Internet addresses
  • Every host on the Internet is identified by a
    unique, four-byte Internet Protocol (IP) address.
  • This is written in dotted quad format like
    199.1.32.90 where each byte is an unsigned
    integer between 0 and 255.
  • There are about four billion unique IP addresses,
    but they arent very efficiently allocated

8
Domain Name System (DNS)
  • Numeric addresses are mapped to names like
    "www.blackstar.com" or "star.blackstar.com" by
    DNS.
  • Each site runs domain name server software that
    translates names to IP addresses and vice versa
  • DNS is a distributed system

9
The InetAddress Class
  • The java.net.InetAddress class represents an IP
    address.
  • It converts numeric addresses to host names and
    host names to numeric addresses.
  • It is used by other network classes like Socket
    and ServerSocket to identify hosts

10
Creating InetAddresses
  • There are no public InetAddress() constructors.
    Arbitrary addresses may not be created.
  • All addresses that are created must be checked
    with DNS

11
The getByName() factory method
  • public static InetAddress getByName(String host)
    throws UnknownHostException
  • InetAddress utopia, duke
  • try
  • utopia InetAddress.getByName("utopia.poly.edu"
    )
  • duke InetAddress.getByName("128.238.2.92")
  • catch (UnknownHostException e)
  • System.err.println(e)

12
Other ways to create InetAddress objects
  • public static InetAddress getAllByName(String
    host) throws UnknownHostException
  • public static InetAddress getLocalHost() throws
    UnknownHostException

13
Getter Methods
  • public boolean isMulticastAddress()
  • public String getHostName()
  • public byte getAddress()
  • public String getHostAddress()

14
Utility Methods
  • public int hashCode()
  • public boolean equals(Object o)
  • public String toString()

15
Ports
  • In general a host has only one Internet address
  • This address is subdivided into 65,536 ports
  • Ports are logical abstractions that allow one
    host to communicate simultaneously with many
    other hosts
  • Many services run on well-known ports. For
    example, http tends to run on port 80

16
Protocols
  • A protocol defines how two hosts talk to each
    other.
  • The daytime protocol, RFC 867, specifies an ASCII
    representation for the time that's legible to
    humans.
  • The time protocol, RFC 868, specifies a binary
    representation, for the time that's legible to
    computers.
  • There are thousands of protocols, standard and
    non-standard

17
IETF RFCs
  • Requests For Comment
  • Document how much of the Internet works
  • Various status levels from obsolete to required
    to informational
  • TCP/IP, telnet, SMTP, MIME, HTTP, and more
  • http//ds.internic.net/rfc/

18
W3C Standards
  • IETF is based on rough consensus and running
    code
  • W3C tries to run ahead of implementation
  • IETF is an informal organization open to
    participation by anyone
  • W3C is a vendor consortium open only to companies

19
URLs
  • A URL, short for "Uniform Resource Locator", is a
    way to unambiguously identify the location of a
    resource on the Internet.

20
Example URLs
  • http//www.javasoft.com/
  • file///Macintosh20HD/Java/Docs/JDK201.1.120doc
    s/api/java.net.InetAddress.html_top_
  • http//www.macintouch.com80/newsrecent.shtml
  • ftp//ftp.info.apple.com/pub/
  • mailtoelharo_at_sunsite.unc.edu
  • telnet//utopia.poly.edu

21
The Pieces of a URL
  • Most URLs can be broken into about five pieces,
    not all of which are necessarily present in any
    given URL. These are
  • the protocol
  • the host
  • the port
  • the file
  • the ref, section, or anchor

22
The java.net.URL class
  • A URL object represents a URL.
  • The URL class contains methods to
  • create new URLs
  • parse the different parts of a URL
  • get an input stream from a URL so you can read
    data from a server
  • get content from the server as a Java object

23
Content and Protocol Handlers
  • Content and protocol handlers separate the data
    being downloaded from the the protocol used to
    download it.
  • The protocol handler negotiates with the server
    and parses any headers. It gives the content
    handler only the actual data of the requested
    resource.
  • The content handler translates those bytes into a
    Java object like an InputStream or ImageProducer.

24
Finding Protocol Handlers
  • When you construct a URL object, the virtual
    machine looks for a protocol handler that
    understands the protocol part of the URL such as
    "http" or "mailto".
  • If no such handler is found, the constructor
    throws a MalformedURLException.

25
Supported Protocols
  • The exact protocols that Java supports vary from
    implementation to implementation though http and
    file are supported pretty much everywhere. Sun's
    JDK 1.1 understands ten
  • file
  • ftp
  • gopher
  • http
  • mailto
  • appletresource
  • doc
  • netdoc
  • systemresource
  • verbatim

26
URL Constructors
  • There are four constructors in the java.net.URL
    class. All can throw MalformedURLExceptions.
  • public URL(String u) throws MalformedURLException
  • public URL(String protocol, String host, String
    file) throws MalformedURLException
  • public URL(String protocol, String host, int
    port, String file) throws MalformedURLException
  • public URL(URL context, String u) throws
    MalformedURLException

27
Constructing URL Objects
  • Construct a URL object for a complete, absolute
    URL like http//www.poly.edu/fall97/grad.htmlcs
    like this
  • try
  • URL u new
  • URL("http//www.poly.edu/fall97/grad.htmlcs )
  • catch (MalformedURLException e)

28
Constructing URL Objects in Pieces
  • You can also construct the URL by passing its
    pieces to the constructor, like this
  • URL u null
  • try
  • u new URL("http", "www.poly.edu",
    "/schedule/fall97/bgrad.htmlcs")
  • catch (MalformedURLException e)

29
Including the Port
  • URL u null
  • try
  • u new URL("http", "www.poly.edu", 8000,
    "/fall97/grad.htmlcs")
  • catch (MalformedURLException e)

30
Relative URLs
  • Many HTML files contain relative URLs.
  • Consider the page http//sunsite.unc.edu/javafaq/i
    ndex.html
  • On this page a link to books.html" refers to
    http//sunsite.unc.edu/javafaq/books.html.

31
Constructing Relative URLs
  • The fourth constructor creates URLs relative to a
    given URL. For example,
  • try
  • URL u1 new URL("http//sunsite.unc.edu/index.h
    tml")
  • URL u2 new URL(u1, books.html")
  • catch (MalformedURLException e)
  • This is particularly useful when parsing HTML.

32
Parsing URLs
  • The java.net.URL class has five methods to spilt
    a URL into its component parts. These are
  • public String getProtocol()
  • public String getHost()
  • public int getPort()
  • public String getFile()
  • public String getRef()

33
For example,
  • try
  • URL u new URL("http//www.poly.edu/fall97/grad
    .htmlcs ")
  • System.out.println("The protocol is "
    u.getProtocol())
  • System.out.println("The host is "
    u.getHost())
  • System.out.println("The port is "
    u.getPort())
  • System.out.println("The file is "
    u.getFile())
  • System.out.println("The anchor is "
    u.getRef())
  • catch (MalformedURLException e)

34
Missing Pieces
  • If a port is not explicitly specified in the URL
    it's set to -1. This means the default port is to
    be used.
  • If the ref doesn't exist, it's just null, so
    watch out for NullPointerExceptions. Better yet,
    test to see that it's non-null before using it.
  • If the file is left off completely, e.g.
    http//www.javasoft.com, then it's set to "/".

35
Reading Data from a URL
  • The openStream() method connects to the server
    specified in the URL and returns an InputStream
    object fed by the data from that connection.
  • public final InputStream openStream() throws
    IOException
  • Any headers that precede the actual data are
    stripped off before the stream is opened.
  • Network connections are less reliable and slower
    than files. Buffer with a BufferedInputStream or
    a BufferedReader.

36
import java.net. import java.io. public
class Webcat public static void main(String
args) for (int i 0 i lt args.length
i) try URL u new
URL(argsi) InputStream in
u.openStream() InputStreamReader isr
new InputStreamReader(in) BufferedReader
br new BufferedReader(isr) String
theLine while ((theLine br.readLine())
! null) System.out.println(theLine)
catch (MalformedURLExcepti
on e) System.err.println(e) catch
(IOException e) System.err.println(e)

37
CGI
  • Common Gateway Interface
  • A lot is written about writing server side CGI.
    Im going to show you client side CGI.
  • Well need to explore HTTP a little deeper to do
    this

38
Normal web surfing uses these two steps
  • The browser request a page
  • The server sends the page
  • Data flows primarily from the server to the
    client.

39
Forms
  • There are times when the server needs to get data
    from the client rather than the other way around.
    The common way to do this is with a form like
    this one

40
CGI
  • The user types the requested data into the form
    and hits the submit button.
  • The client browser then sends the data to the
    server using the Common Gateway Interface, CGI
    for short.
  • CGI uses the HTTP protocol to transmit the data,
    either as part of the query string or as separate
    data following the MIME header.

41
GET and POST
  • When the data is sent as a query string included
    with the file request, this is called CGI GET.
  • When the data is sent as data attached to the
    request following the MIME header, this is called
    CGI POST

42
HTTP
  • Web browsers communicate with web servers through
    a standard protocol known as HTTP, an acronym for
    HyperText Transfer Protocol.
  • This protocol defines
  • how a browser requests a file from a web server
  • how a browser sends additional data along with
    the request (e.g. the data formats it can
    accept),
  • how the server sends data back to the client
  • response codes

43
A Typical HTTP Connection
  • Client opens a socket to port 80 on the server.
  • Client sends a GET request including the name and
    path of the file it wants and the version of the
    HTTP protocol it supports.
  • The client sends a MIME header.
  • The client sends a blank line.
  • The server sends a MIME header
  • The server sends the data in the file.
  • The server closes the connection.

44
MIME
  • MIME is an acronym for "Multipurpose Internet
    Mail Extensions".
  • an Internet standard defined in RFCs 2045 through
    2049
  • originally intended for use with email messages,
    but has been been adopted for use in HTTP.

45
Browser Request MIME Header
  • When the browser sends a request to a web server,
    it also sends a MIME header. MIME headers contain
    name-value pairs, essentially a name followed by
    a colon and a space, followed by a value.
  • Connection Keep-Alive
  • User-Agent Mozilla/3.01 (Macintosh I PPC)
  • Host www.digitalthink.com80
  • Accept image/gif, image/x-xbitmap, image/jpeg,
    image/pjpeg, /

46
Server Response MIME Header
  • When a web server responds to a web browser it
    sends a MIME header along with the response that
    looks something like this
  • Server Netscape-Enterprise/2.01
  • Date Sat, 02 Aug 1997 075246 GMT
  • Accept-ranges bytes
  • Last-modified Tue, 29 Jul 1997 150646 GMT
  • Content-length 2810
  • Content-type text/html

47
Query Strings
  • CGI GET data is sent in URL encoded query strings
  • a query string is a set of namevalue pairs
    separated by ampersands
  • AuthorSadie, JulieTitleWomen Composers
  • separated from rest of URL by a question mark

48
URL Encoding
  • Alphanumeric ASCII characters (a-z, A-Z, and 0-9)
    and the -_.!'(), punctuation symbols are left
    unchanged.
  • The space character is converted into a plus sign
    ().
  • Other characters (e.g. , , , , , , , and
    so on) are translated into a percent sign
    followed by the two hexadecimal digits
    corresponding to their numeric value.

49
For example,
  • The comma is ASCII character 44 (decimal) or 2C
    (hex). Therefore if the comma appears as part of
    a URL it is encoded as 2C.
  • The query string "AuthorSadie, JulieTitleWomen
    Composers" is encoded as
  • AuthorSadie2CJulieTitleWomenComposers

50
The URLEncoder class
  • The java.net.URLEncoder class contains a single
    static method which encodes strings in
    x-www-form-url-encoded format
  • URLEncoder.encode(String s)

51
For example,
  • String qs "AuthorSadie, JulieTitleWomen
    Composers"
  • String eqs URLEncoder.encode(qs)
  • System.out.println(eqs)
  • This prints
  • Author3dSadie2cJulie26Title3dWomenComposers

52
  • String eqs "Author" URLEncoder.encode("Sadie,
    Julie")
  • eqs ""
  • eqs "Title"
  • eqs URLEncoder.encode("Women Composers")
  • This prints the properly encoded query string
  • AuthorSadie2cJulieTitleWomenComposers

53
GET URLs
  • String eqs "Author" URLEncoder.encode("Sadie,
    Julie")
  • eqs ""
  • eqs "Title"
  • eqs URLEncoder.encode("Women Composers")
  • try
  • URL u new URL("http//www.superbooks.com/sea
    rch.cgi?" eqs)
  • InputStream in u.openStream()
  • //...
  • catch (IOException e) //...

54
URLConnections
  • The java.net.URLConnection class is an abstract
    class that handles communication with different
    kinds of servers like ftp servers and web
    servers.
  • Protocol specific subclasses of URLConnection
    handle different kinds of servers.
  • By default, connections to HTTP URLs use the GET
    method.

55
URLConnections vs. URLs
  • Can send output as well as read input
  • Can post data to CGIs
  • Can read headers from a connection

56
URLConnection five steps
  • 1. The URL is constructed.
  • 2. The URLs openConnection() method creates the
    URLConnection object.
  • 3. The parameters for the connection and the
    request properties that the client sends to the
    server are set up.
  • 4. The connect() method makes the connection to
    the server.
  • 5. The response header information is read using
    getHeaderField().

57
I/O Across a URLConnection
  • Data may be read from the connection in one of
    two ways
  • raw by using the input stream returned by
    getInputStream()
  • through a content handler with getContent().
  • Data can be sent to the server using the output
    stream provided by getOutputStream().

58
For example,
  • try
  • URL u new URL("http//www.sd98.com/")
  • URLConnection uc u.openConnection()
  • uc.connect()
  • InputStream in uc.getInputStream()
  • // read the data...
  • catch (IOException e) //...

59
Reading Header Data
  • The getHeaderField(String name) method returns
    the string value of a named header field.
  • Names are case-insensitive.
  • If the requested field is not present, null is
    returned.
  • String lm uc.getHeaderField("Last-modified")

60
getHeaderFieldKey()
  • The keys of the header fields are returned by the
    getHeaderFieldKey(int n) method.
  • The first field is 1.
  • If a numbered key is not found, null is returned.
  • You can use this in combination with
    getHeaderField() to loop through the complete
    header

61
For example
  • String key null
  • for (int i1 (key uc.getHeaderFieldKey(i))!nul
    l) i)
  • System.out.println(key " "
    uc.getHeaderField(key))

62
getHeaderFieldInt() and getHeaderFieldDate()
  • These are utility methods that read a named
    header and convert its value into an int and a
    long respectively.
  • public int getHeaderFieldInt(String name, int
    default)
  • public long getHeaderFieldDate(String name, long
    default)

63
  • The long returned by getHeaderFieldDate() can be
    converted into a Date object using a Date()
    constructor like this
  • String s uc.getHeaderFieldDate("Last-modified",
    0)
  • Date lm new Date(s)

64
Six Convenience Methods
  • These return the values of six particularly
    common header fields
  • public int getContentLength()
  • public String getContentType()
  • public String getContentEncoding()
  • public long getExpiration()
  • public long getDate()
  • public long getLastModified()

65
  • try
  • URL u new URL(http//www.sd98.com/)
  • URLConnection uc u.openConnection()
  • uc.connect()
  • String keynull
  • for (int n 1 (key uc.getHeaderFieldKey(n
    )) ! null n)
  • System.out.println(key " "
    uc.getHeaderField(key))
  • catch (IOException e)
  • System.err.println(e)

66
Writing data to a URLConnection
  • Similar to reading data from a URLConnection.
  • First inform the URLConnection that you plan to
    use it for output
  • Before getting the connection's input stream, get
    the connection's output stream and write to it.
  • Commonly used to talk to CGIs that use the POST
    method

67
Nine Steps
  • Construct the URL.
  • Call the URLs openConnection() method to create
    the URLConnection object.
  • Pass true to the URLConnections setDoOutput()
    method
  • Invoke setDoInput(true) to indicate that this
    URLConnection will also be used for input.
  • Create the data you want to send, preferably as a
    byte array.

68
  • Call getOutputStream() to get an output stream
    object.
  • Write the byte array calculated in step 5 onto
    the stream.
  • Close the output stream.
  • Call getInputStream() to get an input stream
    object. Read and write it as usual.

69
POST CGIs
  • A typical POST request to a CGI looks like this
  • POST /cgi-bin/booksearch.pl HTTP/1.0
  • Referer http//www.macfaq.com/sampleform.html
  • User-Agent Mozilla/3.01 (Macintosh I PPC)
  • Content-length 60
  • Content-type text/x-www-form-urlencoded
  • Host utopia.poly.edu56435
  • usernameSadie2CJulierealnameWomenComposers

70
A POST request includes
  • the POST line
  • a MIME header which must include
  • content type
  • content length
  • a blank line that signals the end of the MIME
    header
  • the actual data of the form, encoded in
    x-www-form-urlencoded format.

71
  • A URLConnection for an http URL will set up the
    request line and the MIME header for you as long
    as you set its doOutput field to true by invoking
    setDoOutput(true).
  • If you also want to read from the connection, you
    should set doInput to true with setDoInput(true)
    too.

72
For example,
  • URLConnection uc u.openConnection()
  • uc.setDoOutput(true)
  • uc.setDoInput(true)

73
  • The request line and MIME header are sent as
    soon as the URLConnection connects. Then use
    getOutputStream() to get an output stream on
    which you'll write the x-www-form-urlencoded
    name-value pairs.

74
HttpURLConnection
  • java.net.HttpURLConnection is an abstract
    subclass of URLConnection that provides some
    additional methods specific to the HTTP protocol.
  • URL connection objects that are returned by an
    http URL will be instances of java.net.HttpURLConn
    ection.

75
Recall
  • a typical HTTP response from a web server begins
    like this
  • HTTP/1.0 200 OK
  • Server Netscape-Enterprise/2.01
  • Date Sat, 02 Aug 1997 075246 GMT
  • Accept-ranges bytes
  • Last-modified Tue, 29 Jul 1997 150646 GMT
  • Content-length 2810
  • Content-type text/html

76
Response Codes
  • The getHeaderField() and getHeaderFieldKey()
    don't return the HTTP response code
  • After you've connected, you can retrieve the
    numeric response code--200 in the above
    example--with the getResponseCode() method and
    the message associated with it--OK in the above
    example--with the getResponseMessage() method.

77
  • Java 1.0 only supported GET and POST requests to
    HTTP servers, but Java 1.1 allows the much
    broader range of requests specified in the
    HTTP/1.1 specification including GET, POST, HEAD,
    OPTIONS, PUT, DELETE, and TRACE.
  • These are set with the void setRequestMethod(Strin
    g method) method.
  • This method throws a java.net.ProtocolException,
    a subclass of IOException, if an unknown protocol
    is specified.

78
getRequestMethod()
  • The getRequestMethod() method returns the string
    form of the request method currently set for the
    URLConnection. GET is the default method.

79
disconnect()
  • The void disconnect() method of the
    HttpURLConnection class allows you to close the
    connection to the web server.
  • Needed for HTTP/1.1 Keep-alive

80
For example,
  • try
  • URL u new URL("http//www.amnesty.org/")
  • HttpURLConnection huc (HttpURLConnection)
    u.openConnection()
  • huc.setRequestMethod("PUT")
  • OutputStream os huc.getOutputStream()
  • int code huc.getResponseCode()
  • if (code gt 200 lt 300)
  • // put the data...
  • huc.disconnect()
  • catch (IOException e) //...

81
usingProxy
  • The boolean usingProxy() method returns true if
    web connections are being funneled through a
    proxy server, false if they're not.

82
  • The HttpURLConnection class also has two static
    methods that affect how all URLConnection objects
    interact with web servers. With a true argument,
    the HttpURLConnection.setFollowRedirects(boolean
    followRedirects) method says that connections
    will follow redirect instructions from the web
    server. Untrusted applets are not allowed to set
    this. The boolean method HttpURLConnection.getFoll
    owRedirects() returns true if redirect requests
    are honored, false if they're not.

83
Redirect Instructions
  • Most web servers can be configured to
    automatically redirect browsers to the new
    location of a page that's moved.
  • To redirect browsers, a server sends a 300 level
    response and a Location header that specifies the
    new location of the requested page.

84
  • GET /elharo/macfaq/index.html HTTP/1.0
  • HTTP/1.1 302 Moved Temporarily
  • Date Mon, 04 Aug 1997 142127 GMT
  • Server Apache/1.2b7
  • Location http//www.macfaq.com/macfaq/index.html
  • Connection close
  • Content-type text/html
  • ltHTMLgtltHEADgt
  • ltTITLEgt302 Moved Temporarilylt/TITLEgt
  • lt/HEADgtltBODYgt
  • ltH1gtMoved Temporarilylt/H1gt
  • The document has moved ltA HREF"http//www.macfaq.
    com/macfaq/index.html"gtherelt/Agt.ltPgt
  • lt/BODYgtlt/HTMLgt

85
  • HTML is returned for browsers that don't
    understand redirects, but most modern browsers do
    not display this and jump straight to the page
    specified in the Location header instead.
  • Because redirects can change the site which a
    user is connecting without their knowledge so
    redirects are not arbitrarily followed by
    URLConnections.

86
To Learn More
  • Java Network Programming
  • OReilly Associates, 1997
  • ISBN 1-56592-227-1
  • Web Client Programming with Java
  • http//www.digitalthink.com/catalog/cs/cs308/index
    .html
Write a Comment
User Comments (0)
About PowerShow.com